#Ethiopian Text-to-speech.
Before we start here is what i did.
*Main Download Ubuntu 20.04 so it will install python 3.8.10 Here is Ubuntu 20.04
I highly recommend you to only if u have GPU 8 and 16 ram and above which i recommend you to use CUDA but if u don't have GPU you can use CPU just CUDA: True make it False just change True to False and your good to go
to set CUDA what this video NVIDIA CUDA installation.
and at last download and install git here to download GIT.
##Lets Start
- Clone
https://github.com/dawit3228/Ethiopa-text-to-speech.git
- Create virtual Enviromet
python3 -m venv name
- To activate
source name/bin/activate
- you need to install all the requirements.txt
pip install -r requirements.txt
don't forget to install TTS
pip install TTS
- i upload The Audio (.wav) so you can use it ➡️Audio datasets this is for the Text dataset to download click the link➡️text Dataset
-
Paste this in your terminal
sudo apt-get install espeak
7.
This wil make it editable
pip install -e .
- Short audio recordings (at least 100) that are:
•In 16-bit, mono PCM WAV format.
•Between 1 and 10 seconds each.
•Have a sample rate of 22050 Hz.
•Have a minimum of background noise and distortion.
•Have no long pauses of silence at the beginning •throughout the middle, and at the end. - indicates what text is spoken in the WAV file.
- A configuration file tailored to your data set and chosen vocoder (e.g. Tacotron, WavGrad, etc).
- A machine with a fast CPU (ideally an nVidia GPU with CUDA support and at least 12 GB of GPU RAM; you cannot effectively use CUDA if you have less than 8 GB OF GPU RAM).
- Lots of RAM (at least 16 GB of RAM is preferable).
You need to prepare a configuration file that describes how your custom TTS will be configured. This file is used by multiple parts of Mozilla TTS when preparing for training, performing training, and generating audio from your custom TTS. Unfortunately, though this file is very important, the documentation for Mozilla TTS largely glosses over how to customize this file.
costomize you the config.json file config.json
so now you need to prepare the .npy here ispython3 TTS/bin/compute_statistics.py --config_path /path/to/your/project/config.json --out_path /path/to/your/project/scale_stats.npy
NOTE Don't Change scale_stats leave it as it is .
If successful, this will generate a scale_stats.npy file under /path/to/your/project/scale_stats.npy. Be sure that the path in the audio.stats_path setting of your config.json file matches this path. raining the Model.
It's now time for the moment of truth -- it's time to start training your model!.
Only if you have GPU
CUDA_VISIBLE_DEVICES="0" python3 TTS/bin/train_tacotron.py --config_path TTS/tts/configs/ljspeech_tacotron2_dynamic_conv_attn.json
to make sure you are using your GPU open a new terminal and paste this.
nvidia-smi
if you want to see your tensorboard how is train log here is it.
tensorboard --logdir=add/this/to/your/path
if you don't have GPU use this
python3 TTS/bin/train_tacotron.py --config_path TTS/tts/configs/ljspeech_tacotron2_dynamic_conv_attn.json
This process will take several hours, if not days. If your machine supports CUDA and has it properly configured, the process will run more quickly than if you are just relying on CPU alone.