This is a modification of the Cartesia Voice Agent using faster_whisper, ollama, and xtts for local inference.
The example includes a custom Next.js frontend, Python agent and a modification of xtts-streaming-server.
- Node.js
- Python 3.9-3.12
- LiveKit Cloud account (or OSS LiveKit server)
- Ollama (for LLM)
Copy .env.example
to .env.local
and set the environment variables. Then run:
cd frontend
npm install
npm run dev
Copy .env.example
to .env
and set the environment variables. Then run:
cd agent
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.py dev
cd server
python3 -m venv venv
source venv/bin/activate
pip install --use-deprecated=legacy-resolver -r requirements.txt
python -m unidic download
python main.py