Skip to content

Pytorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS (text to speech, speech synthesis) based on FastSpeech2, supporting English and Korean

License

Notifications You must be signed in to change notification settings

3i-hust-tts/Expressive-FastSpeech2

 
 

Repository files navigation

Expressive-FastSpeech2 - PyTorch Implementation

Contributions

  1. Non-autoregressive Expressive TTS: This project aims to provide a cornerstone for future research and application on a non-autoregressive expressive TTS including Emotional TTS and Conversational TTS. For datasets, AIHub Multimodal Video AI datasets and IEMOCAP database are picked for Korean and English, respectively.

    Note: If you are interested in GST-Tacotron or VAE-Tacotron like expressive stylistic TTS model but under non-autoregressive decoding, you may also be interested in STYLER [demo, code].

  2. Annotated Data Processing: This project shed light on how to handle the new dataset, even with a different language, for the successful training of non-autoregressive emotional TTS.

  3. English and Korean TTS: In addition to English, this project gives a broad view of treating Korean for the non-autoregressive TTS where the additional data processing must be considered under the language-specific features (e.g., training Montreal Forced Aligner with your own language and dataset). Please closely look into text/.

Repository Structure

In this project, FastSpeech2 is adapted as a base non-autoregressive multi-speaker TTS framework, so it would be helpful to read the paper and code first (Also see FastSpeech2 branch).

  1. Emotional TTS: Following branches contain implementations of the basic paradigm intorduced by Emotional End-to-End Neural Speech synthesizer.

    • categorical branch: only conditioning categorical emotional descriptors (such as happy, sad, etc.)
    • continuous branch: conditioning continuous emotional descriptors (such as arousal, valence, etc.) in addition to categorical emotional descriptors
  2. Conversational TTS: Following branch contains implementation of Conversational End-to-End TTS for Voice Agent

Citation

If you would like to use or refer to this implementation, please cite the repo.

@misc{lee2021expressive_fastspeech2,
  author = {Lee, Keon},
  title = {Expressive-FastSpeech2},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/keonlee9420/Expressive-FastSpeech2}}
}

References

About

Pytorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS (text to speech, speech synthesis) based on FastSpeech2, supporting English and Korean

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%