Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning

This repository contains the code and generated sound samples of our paper "Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning", which was accepted for MLSP 2021.

Set up environment

Clone the repository:

git clone https://github.com/liuxubo717/sound_generation.git

Create conda environment with dependencies:

conda create -f environment.yml -n sound_generation

Activate conda environment:
```
conda activate sound_generation
```

Prepare dataset

Usage

1: (Stage 1) train a multi-scale VQ-VAE to extract the Discrete T-F Representation (DTFR) of sound:

python train_vqvae.py --epoch 800

2: Extract DTFR for stage 2 training:

python extract_code.py --ckpt checkpoint/[VQ-VAE CHECKPOINT]

3: (Stage 3) train a PixelSNAIL model on the extracted DTFR of sound:

python train_pixelsnail.py --epoch 2000

4: Sample mel-spectrogram of sound from the trained PixelSNAIL model:

python mel_sample.py --vqvae checkpoint/[VQ-VAE CHECKPOINT] --bottom checkpoint/[PixelSNAIL CHECKPOINT] --label [Class ID: 0-9]

5: Synthesize waveform of sound using HiFi-GAN vocoder:

python mel2audio.py --input_mels_dor [INPUT MEL-SPECTROGRAM PATH] --output_dir [OUTPUT WAVEFORM PATH]

The trained HiFi-GAN checkpoint is provided in /hifi_gan/cp_hifigan/g_00335000

Generated samples

The generated sound samples are available at /generated_sounds

Cite

If you use our code, please kindly cite following:

@article{liu2021conditional,
  title={Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning},
  author={Liu, Xubo and Iqbal, Turab and Zhao, Jinzheng and Huang, Qiushi and Plumbley, Mark D and Wang, Wenwu},
  journal={arXiv preprint arXiv:2107.09998},
  year={2021}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning

Set up environment

Prepare dataset

Usage

Generated samples

Cite

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
generated_sounds		generated_sounds
hifi_gan/cp_hifigan		hifi_gan/cp_hifigan
README.md		README.md
audio2mel.py		audio2mel.py
datasets.py		datasets.py
extract_code.py		extract_code.py
mel2audio.py		mel2audio.py
mel_sample.py		mel_sample.py
pixelsnail.py		pixelsnail.py
sample.py		sample.py
scheduler.py		scheduler.py
train_pixelsnail.py		train_pixelsnail.py
train_vqvae.py		train_vqvae.py
vocoder.py		vocoder.py
vqvae.py		vqvae.py

liuxubo717/sound_generation

Folders and files

Latest commit

History

Repository files navigation

Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning

Set up environment

Prepare dataset

Usage

Generated samples

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages