Uses a couple of neural networks to produce response as video notes.
For education purposes. Inspired by hackdaddy8000/unsuperior-ai-waifu
- Textual model: OpenAI GPT-3 or EleutherAI/gpt-j-6B
- Voice model: speechbrain/tts-tacotron2-ljspeech + speechbrain/tts-hifigan-ljspeech
- Emotion classifier: j-hartmann/emotion-english-distilroberta-base
- Talking face model: Live2D Haru
- Linux with X
- Python 3.8+
- OpenAI token (for GPT-3) or GPU with 15+ GB VRAM (for GPT-J)
- Live2D Samples
- Install virtual python environment with huggingface transformers. doc
- Install python requirements
python3 -m pip install -r requirements.txt
- Clone Live2D Samples and apply git patch.
- Build
Samples/OpenGL/Demo/proj.linux.cmake
project.
Provide environment variables:
ENV | Optional? | Description |
---|---|---|
BOT_TOKEN | No | Token for Telegram Bot API |
OPENAI_TOKEN | Yes (if GPT-J used instead) | Token for GPT-3 |
DISPLAY | No | X Display variable. Used for OpenGL rendering |
LIVE2D_EXECUTABLE | No | Path for Live2D Demo executable |
Run app:
python3 -m virtualfriend