SawSing - Subtractive DDSP Vocoder for singing voice

Clone of SawSing DDSP vocoder's official Implementation.

Demo
Usage
- Install
- Train
- Inference
Results
References

Demo

Usage

Install

# pip install "torch==1.11.0" -q      # Based on your environment (validated with vX.YZ)
# pip install "torchaudio==0.11.0" -q # Based on your environment
# pip install git+https://github.com/tarepan/SawSing-official
pip install -r requirements.txt

Dataset & Preprocessing

Place 24kHz/16bit .wav files with below directory structure:

data          
├─ solo              # speaker name
│  ├─ test           # scenario-common test
│  ├─ val            # scenario-common validation
│  ├─ train-full     # training scenario No.1
│  │  ├─  audio
│  │  │  ├─  xxx.wav # place .wav here
│  │  ├─  mel
│  │  │  ├─  xxx.npy # Auto-generated by preprocessing

Then, run preprocessing:

python preprocess.py

Train

Train vocoders from scratch.

Modify the configuration file ..config/<model_name>.yaml
Run the following command:

python main.py --config ./configs/sawsinsub.yaml \
               --stage  training \
               --model SawSinSub

You can specify the model with --model argument.
Currently this repository support 5 harmonic plus noise vocoders[4] (3 in the paper, 2 not):

Model Name (in the paper)	Harmonics Synthesizer	Note
`SawSub`	Subtracted Sawtooth (exact)	modified from SawSing paper
`SawSinSub` (SawSing)	Subtracted Sawtooth (additive approx.)	from SawSing paper
`Sins` (DDSP-Add)	Added sinusoids	from DDSP paper
`Full`	Subtracted Added sinusoids	modified from DDSP paper
`DWS` (DWTS)	Wavetable	[3]

SawSinSub differ from SawSub in that it approximate Sawtooth with band-limited addtive sinusoids. This works as anti-aliasing.
More details of syntehsizers, refet to synthesizer_demo.

[3] (ICASSP'22)Differentiable Wavetable Synthesis
[4] (ICASSP'93) HNS: Speech modification based on a harmonic+noise model

For validation (compute validation loss and real-time factor):

Modify the configuration file ..config/<model_name>.yaml
Run the following command:

# SawSing as an example
python main.py --config ./configs/sawsinsub.yaml  \
              --stage validation \
              --model SawSinSub \
              --model_ckpt ./exp/f1-full/sawsinsub-256/ckpts/vocoder_27740_70.0_params.pt \
              --output_dir ./test_gen

Inference

Both CLI and Python supported.
For detail, jump to ☞ and check it.

mel-to-wave inference.
The code and specfication for extracting mel-spectrograms can be found in preprocess.py.

# SawSing as an example
python main.py --config ./configs/sawsinsub.yaml  \
              --stage inference \
              --model SawSinSub \
              --model_ckpt ./exp/f1-full/sawsinsub-256/ckpts/vocoder_27740_70.0_params.pt \
              --input_dir  ./path/to/mel
              --output_dir ./test_gen

For Sawsing buzzing artifacts, run post-processing.
For more details, please refer to here.

Results

Sample

Demo

Performance

training
- x.x [iter/sec] @ NVIDIA X0 on Google Colaboratory (AMP+)
- take about y days for whole training
- Original authors use Nvidia RTX 3090 Ti GPU x1
inference
- z.z [sec/sample] @ xx

Pre-trained Models & records

The authors provide checkpoints and experiment records. Great!

Checkpoints
- Sins (DDSP-Add): ./exp/f1-full/sins/ckpts/
- SawSinSub (Sawsing): ./exp/f1-full/sawsinsub-256/ckpts/
The full experimental records, reports and checkpoints can be found under the exp folder.

Dicsussion and Future Work

glitch artifacts: see also [5]
buzzing artifacts
- only in subtractive synthesizers (SawSub, SawSinSub, Full), see also [6]
- possible solutions
  - Replace LTV-FIR with better filter
  - Applying UV mask
E2E training: data-efficient, intepretable and lightweight -> Joint training with acoustic models
Feature: mel-spectrograms -> controlable features, e.g. f0, UV mask

[5] (ICASSP'22) Improving adversarial waveform generation based singing voice conversion with harmonic signals
[6] (INTERSPEECH'22) Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation

References

Original paper

@article{sawsing,
  title={DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation},
  author={Da-Yi Wu, Wen-Yi Hsiao, Fu-Rong Yang, Oscar Friedman, Warren Jackson, Scott Bruzenak, Yi-Wen Liu, Yi-Hsuan Yang},
  journal = {Proc. International Society for Music Information Retrieval},
  year    = {2022},
}

Acknowlegements

Any preceding works

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
configs		configs
ddsp		ddsp
docs		docs
exp		exp
logger		logger
postprocessing		postprocessing
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compare.py		compare.py
data_cnpop.py		data_cnpop.py
main.py		main.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
sawsing.ipynb		sawsing.ipynb
solver.py		solver.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SawSing - Subtractive DDSP Vocoder for singing voice

Demo

Usage

Install

Dataset & Preprocessing

Train

Inference

Results

Sample

Performance

Pre-trained Models & records

Dicsussion and Future Work

References

Original paper

Acknowlegements

About

Releases

Packages

Languages

License

tarepan/SawSing-official

Folders and files

Latest commit

History

Repository files navigation

SawSing - Subtractive DDSP Vocoder for singing voice

Demo

Usage

Install

Dataset & Preprocessing

Train

Inference

Results

Sample

Performance

Pre-trained Models & records

Dicsussion and Future Work

References

Original paper

Acknowlegements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages