Skip to content

DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability

License

Notifications You must be signed in to change notification settings

MaxMax2016/DEX-TTS

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability

This repository is the official implementation of DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability.

In this repository, we provide steps for running DEX-TTS and GeDEX-TTS.

🙏 We recommend you visit our demo site. 🙏

DEX-TTS is diffusion-based expressive TTS using reference speech. The overall architecture of DEX-TTS is as below:

DEX-TTS

GeDEX-TTS is the general version of DEX-TTS, which does not use reference speech. The overall architecture of GeDEX-TTS is as below:

GeDEX-TTS

Shortcuts

You can find codes, a demo site, and paper links below.

[👉 Demo]      [📄 Paper]      [💻 DEX-TTS Code]      [💻 GeDEX-TTS Code]     

ToDo

  • Bigvgan vocoder for multi-speaker TTS
  • Multi-gpu training codes
  • LibriTTS & Simpe preprocess recipes
  • Pre-trained weight
  • Precondition VE & VP
  • Evaluation

Citation


License

This repository will be released under the MIT license.

Thanks to the open source codebases such as RetNet, FastSpeech2, Grad-TTS, DiT, MaskDiT, and EDM. This repository is built on them.

About

DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.4%
  • Cython 0.6%