TensorVox is an application designed to enable user-friendly and lightweight neural speech synthesis in the desktop, aimed at increasing accessibility to such technology.
Powered mainly by TensorFlowTTS and also by Coqui-TTS, it is written in pure C++/Qt, using the Tensorflow C API for interacting with the models. This way, we can perform inference without having to install gigabytes worth of Python libraries, just a 100MB DLL.
Grab a copy from the releases, extract the .zip and check the Google Drive folder for models and installation instructions
If you're interested in using your own model, first you need to train then export it.
TensorVox supports models from two main repos:
- TensorFlowTTS: FastSpeech2, Tacotron2, both char and phoneme based and Multi-Band MelGAN. Here's a Colab notebook demonstrating how to export the LJSpeech pretrained, char-based Tacotron2 model:
- Coqui-TTS: Tacotron2 (phoneme-based IPA) and Multi-Band MelGAN, after converting from PyTorch to Tensorflow. Here's a notebook showing how to export the LJSpeech DDC model:
Those two examples should provide you with enough guidance to understand what is needed. If you're looking to train a model specifically for this purpose then I recommend TensorFlowTTS, as it is the one with the best support. As for languages, out-of-the-box support is provided for English (both Coqui and TFTTS), German and Spanish (only TensorFlowTTS); that is, you won't have to modify any code.
Currently, only Windows 10 x64 (although I've heard reports of it running on 8.1) is supported.
Requirements:
- Qt Creator
- MSVC 2017 (v141) compiler
Primed build (with all provided libraries):
- Download precompiled binary dependencies and includes
- Unzip it so that the
deps
folder is in the same place as the .pro and main source files. - Open the project with Qt Creator, add your compiler and compile
Note that to try your shiny new executable you'll need to download the program as described above and insert the models
folder where your new build is output.
TODO: Add instructions for compile from scratch.
- Tensorflow C API: https://www.tensorflow.org/install/lang_c
- CppFlow (TF C API -> C++ wrapper): https://github.com/serizba/cppflow
- AudioFile (for WAV export): https://github.com/adamstark/AudioFile
- Frameless Dark Style Window: https://github.com/Jorgen-VikingGod/Qt-Frameless-Window-DarkStyle
- JSON for modern C++: https://github.com/nlohmann/json
- r8brain-free-src (Resampling): https://github.com/avaneev/r8brain-free-src
- rnnoise (CMake version, denoising output): https://github.com/almogh52/rnnoise-cmake
- Logitech LED Illumination SDK (Mouse RGB integration): https://www.logitechg.com/en-us/innovation/developer-lab.html
- QCustomPlot : https://www.qcustomplot.com/index.php/introduction
- libnumbertext : https://github.com/Numbertext/libnumbertext
You can open an issue here or join the Discord server and discuss/ask anything there
For media/licensing/any other formal stuff inquiries, send to this email: [email protected]
This program itself is MIT licensed, but for the models you use, their license terms apply. For example, if you're in Vietnam and using TensorFlowTTS models, you'll have to check here for some details