Skip to content

tarepan/SawSing-official

 
 

Repository files navigation

SawSing - Subtractive DDSP Vocoder for singing voice

ColabBadge PaperBadge

Clone of SawSing DDSP vocoder's official Implementation.

Demo

demo page.

Usage

Install

# pip install "torch==1.11.0" -q      # Based on your environment (validated with vX.YZ)
# pip install "torchaudio==0.11.0" -q # Based on your environment
# pip install git+https://github.com/tarepan/SawSing-official
pip install -r requirements.txt 

Dataset & Preprocessing

Place 24kHz/16bit .wav files with below directory structure:

data          
├─ solo              # speaker name
│  ├─ test           # scenario-common test
│  ├─ val            # scenario-common validation
│  ├─ train-full     # training scenario No.1
│  │  ├─  audio
│  │  │  ├─  xxx.wav # place .wav here
│  │  ├─  mel
│  │  │  ├─  xxx.npy # Auto-generated by preprocessing

Then, run preprocessing:

python preprocess.py

Train

Train vocoders from scratch.

  1. Modify the configuration file ..config/<model_name>.yaml
  2. Run the following command:
python main.py --config ./configs/sawsinsub.yaml \
               --stage  training \
               --model SawSinSub

You can specify the model with --model argument.
Currently this repository support 5 harmonic plus noise vocoders[4] (3 in the paper, 2 not):

Model Name (in the paper) Harmonics Synthesizer Note
SawSub Subtracted Sawtooth (exact) modified from SawSing paper
SawSinSub (SawSing) Subtracted Sawtooth (additive approx.) from SawSing paper
Sins (DDSP-Add) Added sinusoids from DDSP paper
Full Subtracted Added sinusoids modified from DDSP paper
DWS (DWTS) Wavetable [3]

SawSinSub differ from SawSub in that it approximate Sawtooth with band-limited addtive sinusoids. This works as anti-aliasing.
More details of syntehsizers, refet to synthesizer_demo.

[3] (ICASSP'22)Differentiable Wavetable Synthesis
[4] (ICASSP'93) HNS: Speech modification based on a harmonic+noise model

For validation (compute validation loss and real-time factor):

  1. Modify the configuration file ..config/<model_name>.yaml
  2. Run the following command:
# SawSing as an example
python main.py --config ./configs/sawsinsub.yaml  \
              --stage validation \
              --model SawSinSub \
              --model_ckpt ./exp/f1-full/sawsinsub-256/ckpts/vocoder_27740_70.0_params.pt \
              --output_dir ./test_gen

Inference

Both CLI and Python supported.
For detail, jump to ☞ ColabBadge and check it.

mel-to-wave inference.
The code and specfication for extracting mel-spectrograms can be found in preprocess.py.

# SawSing as an example
python main.py --config ./configs/sawsinsub.yaml  \
              --stage inference \
              --model SawSinSub \
              --model_ckpt ./exp/f1-full/sawsinsub-256/ckpts/vocoder_27740_70.0_params.pt \
              --input_dir  ./path/to/mel
              --output_dir ./test_gen

For Sawsing buzzing artifacts, run post-processing.
For more details, please refer to here.

Results

Sample

Demo

Performance

  • training
    • x.x [iter/sec] @ NVIDIA X0 on Google Colaboratory (AMP+)
    • take about y days for whole training
    • Original authors use Nvidia RTX 3090 Ti GPU x1
  • inference
    • z.z [sec/sample] @ xx

Pre-trained Models & records

The authors provide checkpoints and experiment records. Great!

Dicsussion and Future Work

  • glitch artifacts: see also [5]
  • buzzing artifacts
    • only in subtractive synthesizers (SawSub, SawSinSub, Full), see also [6]
    • possible solutions
      • Replace LTV-FIR with better filter
      • Applying UV mask
  • E2E training: data-efficient, intepretable and lightweight -> Joint training with acoustic models
  • Feature: mel-spectrograms -> controlable features, e.g. f0, UV mask

[5] (ICASSP'22) Improving adversarial waveform generation based singing voice conversion with harmonic signals
[6] (INTERSPEECH'22) Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation

References

Original paper

PaperBadge

@article{sawsing,
  title={DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation},
  author={Da-Yi Wu, Wen-Yi Hsiao, Fu-Rong Yang, Oscar Friedman, Warren Jackson, Scott Bruzenak, Yi-Wen Liu, Yi-Hsuan Yang},
  journal = {Proc. International Society for Music Information Retrieval},
  year    = {2022},
}

Acknowlegements

  • Any preceding works

About

Clone of 'SawSing' official implementation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 85.2%
  • Jupyter Notebook 14.8%