Recurrent Neural Network based Text-to-Speech systems

This repository contains code to train a End-to-End Speech Synthesis system. Currently only single speaker models are supported, and the text frontend supports English.

The system consists of two parts:

A Tacotron model with Dynamic Convolutional Attention which modifies the hybrid location sensitive attention mechanism to be purely location based as described in Location Relative Attention Mechanisms for Robust Long-Form Speech Synthesis, resulting in better generalization on long utterances. This model takes text (in the form of character sequence) as input and predicts a sequence of mel-spectrogram frames as output (the seq2seq model).
A WaveRNN based vocoder; which takes the mel-spectrogram predicted in the previous step as input and generates a waveform as output (the vocoder model).

All audio processing parameters, model hyperparameters, training configuration etc are specified in the config/config.py file.

Both the seq2seq model and the vocoder model need to be trained seperately. Training using automatic mixed precision is supported.

Quick start

Train TTS from scratch

Download and extract dataset

English single speaker dataset LJSpeech:

wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar -xvjf LJSpeech-1.1.tar.bz2

Edit the configuration parameters in config/config.py appropriate for the dataset to be used for training

Process the downloaded dataset, and split into train and eval splits

python preprocess.py \
        --dataset_dir <Path to the root of the downloaded dataset> \
        --out_dir <Output path to write the processed dataset>

Train the Tacotron (seq2seq) model

python train_tts.py \
        --train_data_dir <Path to the processed train split> \
        --checkpoint_dir <Path to location where training checkpoints will be saved> \
        --alignments_dir <Path to the location where training alignments will be saved> \
        --resume_checkpoint_path <If specified load checkpoint and resume training>

Train the vocoder model

python train_vocoder.py \
        --train_data_dir <Path to the processed train split> \
        --checkpoint_dir <Path to location where training checkpoints will be saved> \
        --resume_checkpoint_path <If specified load checkpoint and resume training>

Synthesize using a trained TTS model

Prepare the text to be synthesized

The text to be synthesized should be placed in the synthesis.csv file in the following format
```
ID_1|TEXT_1
ID_2|TEXT_2
.
.
.
```

Text to speech synthesis

python tts_synthesis.py \
        --synthesis_file <Path to the synthesis.csv file (created in Step 1)> \
        --seq2seq_checkpoint <Path to the trained seq2seq model to use for synthesis> \
        --vocoder_checkpoint <Path to the trained vocoder model to use for synthesis> \
        --out_dir <Path to where the synthesized waveforms will be written to disk>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recurrent Neural Network based Text-to-Speech systems

Quick start

Train TTS from scratch

Synthesize using a trained TTS model

Acknowledgements

References

To-Do

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 225 Commits
config		config
text/en		text/en
tts		tts
vocoder		vocoder
LICENSE		LICENSE
README.md		README.md
preprocess.py		preprocess.py
train_tts.py		train_tts.py
train_vocoder.py		train_vocoder.py
tts_synthesis.py		tts_synthesis.py
vocoder_generation.py		vocoder_generation.py

License

xiaoyangnihao/TTS-1

Folders and files

Latest commit

History

Repository files navigation

Recurrent Neural Network based Text-to-Speech systems

Quick start

Train TTS from scratch

Synthesize using a trained TTS model

Acknowledgements

References

To-Do

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages