https://arxiv.org/abs/2109.06912
Speech synthesis with fairseq.
- Autoregressive and non-autoregressive models
- Multi-speaker synthesis
- Audio preprocessing (denoising, VAD, etc.) for less curated data
- Automatic metrics for model development
- Similar data configuration as S2T
- Single-speaker synthesis on LJSpeech
- Multi-speaker synthesis on VCTK
- Multi-speaker synthesis on Common Voice
Please cite as:
@article{wang2021fairseqs2,
title={fairseq S\^{} 2: A Scalable and Integrable Speech Synthesis Toolkit},
author={Wang, Changhan and Hsu, Wei-Ning and Adi, Yossi and Polyak, Adam and Lee, Ann and Chen, Peng-Jen and Gu, Jiatao and Pino, Juan},
journal={arXiv preprint arXiv:2109.06912},
year={2021}
}
@inproceedings{ott2019fairseq,
title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
year = {2019},
}