Skip to content

fatchord/WaveRNN

Folders and files

NameName
Last commit message
Last commit date
May 3, 2019
Nov 7, 2019
Mar 24, 2019
May 7, 2019
Sep 5, 2019
Mar 28, 2019
Jul 25, 2019
Mar 2, 2019
Nov 23, 2019
Aug 14, 2019
Aug 22, 2019
Aug 13, 2019
Jul 30, 2019
Nov 7, 2019
Jun 21, 2022
Apr 8, 2019
Sep 5, 2019
Sep 5, 2019

Repository files navigation

WaveRNN

(Update: Vanilla Tacotron One TTS system just implemented - more coming soon!)

Tacotron with WaveRNN diagrams

Pytorch implementation of Deepmind's WaveRNN model from Efficient Neural Audio Synthesis

Installation

Ensure you have:

Then install the rest with pip:

pip install -r requirements.txt

How to Use

Quick Start

If you want to use TTS functionality immediately you can simply use:

python quick_start.py

This will generate everything in the default sentences.txt file and output to a new 'quick_start' folder where you can playback the wav files and take a look at the attention plots

You can also use that script to generate custom tts sentences and/or use '-u' to generate unbatched (better audio quality):

python quick_start.py -u --input_text "What will happen if I run this command?"

Training your own Models

Attenion and Mel Training GIF

Download the LJSpeech Dataset.

Edit hparams.py, point wav_path to your dataset and run:

python preprocess.py

or use preprocess.py --path to point directly to the dataset


Here's my recommendation on what order to run things:

1 - Train Tacotron with:

python train_tacotron.py

2 - You can leave that finish training or at any point you can use:

python train_tacotron.py --force_gta

this will force tactron to create a GTA dataset even if it hasn't finish training.

3 - Train WaveRNN with:

python train_wavernn.py --gta

NB: You can always just run train_wavernn.py without --gta if you're not interested in TTS.

4 - Generate Sentences with both models using:

python gen_tacotron.py wavernn

this will generate default sentences. If you want generate custom sentences you can use

python gen_tacotron.py --input_text "this is whatever you want it to be" wavernn

And finally, you can always use --help on any of those scripts to see what options are available :)

Samples

Can be found here.

Pretrained Models

Currently there are two pretrained models available in the /pretrained/ folder':

Both are trained on LJSpeech

  • WaveRNN (Mixture of Logistics output) trained to 800k steps
  • Tacotron trained to 180k steps

References

Acknowlegements