Skip to content

python server that provides a webRTC interface to allow a user to interact with an LLM via voice

License

Notifications You must be signed in to change notification settings

paulingalls/versey-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python WebRTC Server for Voice Interaction with LLM

This Python web server provides a WebRTC interface to allow users to interact with a Large Language Model (LLM) via voice. The server uses various libraries and models to handle WebRTC, voice activity detection, speech-to-text, natural language processing, and text-to-speech functionalities.

Features

  • WebRTC Support: Uses aiortc for real-time audio streaming.
  • Voice Activity Detection: Utilizes Silero VAD to detect when the user starts and stops speaking.
  • Speech-to-Text: Integrates Whisper for high-quality transcription of spoken words.
  • Natural Language Processing: Implements Llama 3.1 8B Instruct for understanding and generating responses.
  • Text-to-Speech: Uses MeloTTS to convert responses back to speech.
  • Optimized for Mac: Employs the mlx versions of the models for optimized performance on Mac systems.

Installation

Prerequisites

  • Python 3.8 or higher
  • pipenv for dependency management
  • Dependencies listed in Pipfile
  • Access to mlx versions of models

Steps

  1. Clone the repository:

    git clone https://github.com/paulingalls/versey-ai.git
    cd versey-ai
  2. Make sure pipenv is installed:

    pip install pipenv --user
  3. Install the required packages:

    pipenv install
    pipenv run python -m unidic download

Usage

  1. Start the server:

    pipenv run python ./server.py
  2. Wait till it says the server is ready (this could take a little time while it downloads the model files)

  3. Open a browser and navigate to http://localhost:8080 to interact with the LLM via the WebRTC interface.

  4. Click the start button

  5. Wait until it says -open in the data channel (this could take a bit the first time as it downloads the models)

  6. Start talking

Contributing

Contributions are welcome! Please submit a pull request or open an issue to discuss any changes or enhancements.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Acknowledgements

About

python server that provides a webRTC interface to allow a user to interact with an LLM via voice

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published