Name		Name	Last commit message	Last commit date
parent directory ..
config		config
datasets/2019/english		datasets/2019/english
.gitignore		.gitignore
README.md		README.md
convert.py		convert.py
dataset.py		dataset.py
encode.py		encode.py
evaluate_mse.py		evaluate_mse.py
model.py		model.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
train.py		train.py

README.md

SQ-VAE (Speech Dataset)

This repository contains the official Pytorch implementation of SQ-VAE.

Quick Start

Requirements

Ensure you have Python 3 and PyTorch 1.4 or greater.
Install NVIDIA/apex for mixed precision training.
Install pip dependencies:
```
pip install -r requirements.txt
```

Data and Preprocessing

Download and extract the ZeroSpeech2020 dataset. For reproduction, only zerospeech2020.z01, zerospeech2020.z02, and zerospeech2020.zip are required (follow the official instruction to extract the dataset).
Download the train/test splits here and extract in the root directory of the repo.

Preprocess audio and extract train/test log-Mel spectrograms:

python preprocess.py in_dir=/path/to/dataset dataset=2019/english

Note: in_dir must be the path to the 2019 folder.

e.g. python preprocess.py in_dir=../datasets/2020/2019 dataset=2019/english

Training

Train a model:

python train.py checkpoint_dir=path/to/checkpoint_dir dataset=2019/english

e.g. python train.py checkpoint_dir=checkpoints/2019english dataset=2019/english

Note: The default parameterization of the variance is Gaussian SQ-VAE (IV) "gaussian_4". You can switch the parameterizations in config/model/default.yaml: Gaussian SQ-VAE (I) "gaussian_1", Gaussian SQ-VAE (III) "gaussian_3", and Gaussian SQ-VAE (IV) "gaussian_4".

Evaluation

Mean Squared Error

python evaluate_mse.py checkpoint=path/to/checkpoint in_dir=path/to/wavs evaluation_list=path/to/evaluation_list dataset=2019/english

Note: the evaluation list is a json file:

[
    [
        "english/train/parallel/voice/V001_0000000047.wav",
        "V001"
    ]
]

containing a list of items with a) the path (relative to in_dir) of the source wav files; and b) the target speaker (see datasets/2019/english/speakers.json for a list of options).

e.g. python evaluate_mse.py checkpoint=checkpoints/2019english/model.ckpt-500000.pt in_dir=../datasets/2020/2019 evaluation_list=datasets/2019/english/mse_evaluation.json dataset=2019/english

ABX Score

Install bootphon/zerospeech2020.

Encode test data for evaluation:

python encode.py checkpoint=path/to/checkpoint out_dir=path/to/out_dir dataset=2019/english

e.g. python encode.py checkpoint=checkpoints/2019english/model.ckpt-500000.pt out_dir=submission/2019/english/test dataset=2019/english

Run ABX evaluation script (see bootphon/zerospeech2020).

References

This work is based on:

Niekerk, Nortje, and Kamper. "Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge." INTERSPEECH. 2020.
Chorowski, Jan, et al. "Unsupervised speech representation learning using wavenet autoencoders." IEEE/ACM transactions on audio, speech, and language processing 27.12 (2019): 2041-2053.
Lorenzo-Trueba, Jaime, et al. "Towards achieving robust universal neural vocoding." INTERSPEECH. 2019.
van den Oord, Aaron, and Oriol Vinyals. "Neural discrete representation learning." Advances in Neural Information Processing Systems. 2017.

Citation

@INPROCEEDINGS{takida2022sq-vae,
    author={Takida, Yuhta and Shibuya, Takashi and Liao, WeiHsiang and Lai, Chieh-Hsin and Ohmura, Junki and Uesaka, Toshimitsu and Murata, Naoki and Takahashi Shusuke and Kumakura, Toshiyuki and Mitsufuji, Yuki},
    title={SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization},
    booktitle={International Conference on Machine Learning},
    year={2022},
    }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speech

speech

README.md

SQ-VAE (Speech Dataset)

Quick Start

Requirements

Data and Preprocessing

Training

Evaluation

Mean Squared Error

ABX Score

References

Citation

Files

speech

Directory actions

More options

Directory actions

More options

Latest commit

History

speech

Folders and files

parent directory

README.md

SQ-VAE (Speech Dataset)

Quick Start

Requirements

Data and Preprocessing

Training

Evaluation

Mean Squared Error

ABX Score

References

Citation