This Python web server provides a WebRTC interface to allow users to interact with a Large Language Model (LLM) via voice. The server uses various libraries and models to handle WebRTC, voice activity detection, speech-to-text, natural language processing, and text-to-speech functionalities.
- WebRTC Support: Uses
aiortc
for real-time audio streaming. - Voice Activity Detection: Utilizes Silero VAD to detect when the user starts and stops speaking.
- Speech-to-Text: Integrates Whisper for high-quality transcription of spoken words.
- Natural Language Processing: Implements Llama 3.1 8B Instruct for understanding and generating responses.
- Text-to-Speech: Uses MeloTTS to convert responses back to speech.
- Optimized for Mac: Employs the mlx versions of the models for optimized performance on Mac systems.
- Python 3.8 or higher
pipenv
for dependency management- Dependencies listed in
Pipfile
- Access to mlx versions of models
-
Clone the repository:
git clone https://github.com/paulingalls/versey-ai.git cd versey-ai
-
Make sure pipenv is installed:
pip install pipenv --user
-
Install the required packages:
pipenv install pipenv run python -m unidic download
-
Start the server:
pipenv run python ./server.py
-
Wait till it says the server is ready (this could take a little time while it downloads the model files)
-
Open a browser and navigate to
http://localhost:8080
to interact with the LLM via the WebRTC interface. -
Click the start button
-
Wait until it says
-open
in the data channel (this could take a bit the first time as it downloads the models) -
Start talking
Contributions are welcome! Please submit a pull request or open an issue to discuss any changes or enhancements.
This project is licensed under the MIT License. See the LICENSE
file for more details.