WaveNet

An implementation of WaveNet: A Generative Model for Raw Audio https://arxiv.org/abs/1609.03499 .

This project is originated from the hands-on lecture of SPCC 2018. This project rewrote codes of the lecture with following criteria:

Simple, modular and easy to read
Using high level tensorflow APIs: tf.layers.Layer, tf.data.Dataset, tf.estimator.Estimator.
Fix discrepancy of the results between training and inference that causes workaround to dispose wrong results at early steps of inference samples.
Review the lecture and deepen my understandings

This project has following limitations.

Supported data set is LJSpeech only
No sophisticated initialization, optimization and regularization techniques that was in the lecture
Lack of hyper-parameter tuning.
Confirmed generated audio are low quality

For research-ready implementations, please refer to

CURRENNT_MODIFIED

This implementation was tested with Tesla K20c (4.94GiB GPU memory).

Installing dependencies

This project requires python >= 3.6 and tensorflow >= 1.8.

The other dependencies can be installed with conda.

conda env create -f=environment.yml

The following packages are installed.

pyspark=2.3.1
librosa==0.6.1
matplotlib=2.2.2
hypothesis=3.59.1
docopt=0.6.2

Pre-processing

The following pre-processing command executes mel-spectrogram extraction and serialize waveforms, mel-spectrograms and the other meta data into TFRecord (protocol buffer with content hash header) format.

python preprocess.py ljspeech /path/to/input/corpus/dir /path/to/output/dir/of/preprocessed/data

After pre-processing, split data into training, validation, and test sets.

A simple method to create list files is using ls command.

ls /path/to/output/dir/of/preprocessed/data | sed s/.tfrecord// > list.txt

Then split the list.txt into three files.

Training

python train.py --data-root=/path/to/output/dir/of/preprocessed/data --checkpoint-dir=/path/to/checkpoint/dir --dataset=ljspeech --training-list-file=/path/to/file/listing/training/data --validation-list-file=/path/to/file/listing/validation/data --log-file=/path/to/log/file

You can see training and validation losses in a log file and on tensorboard.

tensorboard --logdir=/path/to/checkpoint/dir

(orange line: training loss, blue line: validation loss)

At validation time, predicted waveforms with teacher forcing are generated as images in the checkpoint directory.

(above: natural, below: predicted)

Epoch 1

Epoch 2

Epoch 6

Epoch 10

Prediction

python predict.py --data-root=/path/to/output/dir/of/preprocessed/data --checkpoint-dir=/path/to/checkpoint/dir --dataset=ljspeech --test-list-file=/path/to/file/listing/test/data --output-dir=/path/to/output/dir

At prediction time, predicted samples are generated as audio files and image files.

(above: natural, below: predicted)

Testing

Causal convolution is implemented in two different ways. At training time, causal convolution is executed in parallel with optimized cuda kernel. At inference time, causal convolution is executed sequentially with matrix multiplication. The result of two implementation should be same. This project checks the equality of the two implementation with property based test.

python -m unittest ops/convolutions_test.py

python -m unittest layers/modules_test.py

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.circleci		.circleci
datasets		datasets
docs		docs
layers		layers
models		models
ops		ops
utils		utils
.gitignore		.gitignore
BUILD		BUILD
LICENSE		LICENSE
README.md		README.md
WORKSPACE		WORKSPACE
environment.yml		environment.yml
hparams.py		hparams.py
predict.py		predict.py
preprocess.py		preprocess.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WaveNet

Installing dependencies

Pre-processing

Training

Prediction

Testing

About

Releases

Packages

Languages

License

TanUkkii007/wavenet

Folders and files

Latest commit

History

Repository files navigation

WaveNet

Installing dependencies

Pre-processing

Training

Prediction

Testing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages