Welcome to openai-kokoro-tts! This is a third-party application that provides an OpenAI API-compatible endpoint for generating high-quality text-to-speech (TTS) audio using Kokoro-TTS, a flexible and powerful TTS engine developed by hexgrad. This project can be used as a drop-in replacement for various OpenAI client applications, such as open-webui, enabling seamless integration of Kokoro-TTS into your existing workflows.
- Prerequisites
- Deployment Instructions
- Development Setup with
uv
- ONNX and Transformers Usage
- API Endpoints
- Responsible Use
- Privacy Notice
- AI Disclosure
- Acknowledgments
- TODO
- License
Before getting started, ensure you have the following installed on your system:
- Git: For cloning the repository.
- Python 3.10 or newer: Required for development.
- Docker: Optional but recommended for production deployment.
git clone https://github.com/matthewhand/openai-kokoro-tts
cd openai-kokoro-tts
docker-compose up --build -d
The API will be available at http://localhost:9090
.
uv
is a modern tool for managing Python environments, dependencies, and project workflows. Follow these steps to set up your development environment with uv
.
Run the following command to install uv
:
curl -LsSf https://astral.sh/uv/install.sh | sh
Alternatively, use Homebrew:
brew install uv
Run this command in PowerShell to install uv
:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
-
Copy
.env.example
to.env
:cp .env.example .env
-
Update the
.env
File:- Set a secure value for
API_KEY
to protect your API endpoints:API_KEY=your_secure_api_key_here
- Adjust other settings (e.g.,
PORT
,MODEL_PATH
) based on your requirements.
- Set a secure value for
Run the included setup_models.sh
script:
bash setup_models.sh
This script will:
- Verify and install required tools such as Git LFS and
espeak-ng
. - Clone the Kokoro-82M model repository into the
models/kokoro
directory. - Create the
voices
directory (if missing).
If you're not on Ubuntu, follow these steps to set up the models manually:
-
Install Git LFS:
- macOS:
brew install git-lfs git lfs install
- Windows: Download and install Git LFS from Git LFS website.
- macOS:
-
Clone the Kokoro-82M Repository:
git clone https://huggingface.co/hexgrad/Kokoro-82M models/kokoro
-
Verify Directory Structure: Ensure the
models/kokoro/kokoro-v0_19.pth
file exists. -
Install System Dependencies:
- macOS:
brew install espeak-ng
- Windows: Install espeak-ng.
- macOS:
-
(Optional) Create a Virtual Environment: Navigate to the project directory and create a
uv
project virtual environment:uv venv .venv . .venv/bin/activate
-
Sync Dependencies: Install all Python dependencies specified in the
pyproject.toml
:uv sync
-
Run the Flask Application:
uv run openai_kokoro_tts/server.py
The server will start, and the API will be available at http://localhost:8000
.
By default, the service is configured to use ONNX for efficient CPU-based inference. No additional setup is required.
To run the service in CPU-only mode:
docker-compose up
To leverage GPU acceleration with transformers:
- Rename the example override file:
mv docker-compose.override.yml.example docker-compose.override.yml
- Start the service with GPU support:
docker-compose up
Note: Docker automatically merges
docker-compose.override.yml
withdocker-compose.yml
if it detects it.
Primary route for generating speech from text input. Requires an API key in the request header as a Bearer token.
- URL:
/v1/audio/speech
- Method:
POST
- Headers:
Authorization: Bearer <API_KEY>
- Data (JSON):
input
(string): The input text to convert to speech.voice
(string, optional): Voice model to use (default: "af_bella").response_format
(string, optional): Output audio format (default:mp3
).
Route for listing all available Kokoro-TTS voice models.
- URL:
/v1/models
- Method:
GET
- Headers:
Authorization: Bearer <API_KEY>
- Response:
- A JSON object containing an array of available models.
- Example Response:
{ "models": [ "af", "af_bella", "af_sarah", "am_adam", "am_michael", "bf_emma", "bf_isabella", "bm_george", "bm_lewis", "af_nicole", "af_sky" ] }
The openai-kokoro-tts project is designed for lawful, ethical, and responsible use. Users are prohibited from deploying this tool for:
- Misleading or impersonating individuals.
- Generating disinformation or fraudulent content.
- Violating the privacy or rights of others.
- Harassing, bullying, or otherwise harming individuals or communities.
By using this project, you agree to comply with all applicable laws and OpenAI's usage policies.
This tool processes text inputs to generate speech and does not store or infer additional data from inputs. It is the user’s responsibility to ensure compliance with data privacy regulations when using this tool, especially if processing sensitive or personal data.
Outputs generated using openai-kokoro-tts are AI-generated. Users should not misrepresent these outputs as human-generated, especially in contexts where such misrepresentation could harm others or violate ethical guidelines.
This project utilizes the Kokoro-TTS engine developed by hexgrad. We appreciate their work and contributions to the TTS community.
- ONNX CPU inference
- Transformers GPU inference
- Simplify using kokoro-onnx
This project is licensed under the MIT License.