Python WebRTC Server for Voice Interaction with LLM

This Python web server provides a WebRTC interface to allow users to interact with a Large Language Model (LLM) via voice. The server uses various libraries and models to handle WebRTC, voice activity detection, speech-to-text, natural language processing, and text-to-speech functionalities.

Features

WebRTC Support: Uses aiortc for real-time audio streaming.
Voice Activity Detection: Utilizes Silero VAD to detect when the user starts and stops speaking.
Speech-to-Text: Integrates Whisper for high-quality transcription of spoken words.
Natural Language Processing: Implements Llama 3.1 8B Instruct for understanding and generating responses.
Text-to-Speech: Uses MeloTTS to convert responses back to speech.
Optimized for Mac: Employs the mlx versions of the models for optimized performance on Mac systems.

Installation

Prerequisites

Python 3.8 or higher
pipenv for dependency management
Dependencies listed in Pipfile
Access to mlx versions of models

Steps

Clone the repository:

git clone https://github.com/paulingalls/versey-ai.git
cd versey-ai

Make sure pipenv is installed:
```
pip install pipenv --user
```

Install the required packages:

pipenv install
pipenv run python -m unidic download

Usage

Start the server:
```
pipenv run python ./server.py
```
Wait till it says the server is ready (this could take a little time while it downloads the model files)
Open a browser and navigate to http://localhost:8080 to interact with the LLM via the WebRTC interface.
Click the start button
Wait until it says -open in the data channel (this could take a bit the first time as it downloads the models)
Start talking

Contributing

Contributions are welcome! Please submit a pull request or open an issue to discuss any changes or enhancements.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Acknowledgements

aiortc
Silero VAD
Whisper
Llama
MeloTTS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Python WebRTC Server for Voice Interaction with LLM

Features

Installation

Prerequisites

Steps

Usage

Contributing

License

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Python WebRTC Server for Voice Interaction with LLM

Features

Installation

Prerequisites

Steps

Usage

Contributing

License

Acknowledgements