HiFi-SAN: Enhancing Speech Synthesis with Slicing Adversarial Networks

HiFi-SAN leverages Slicing Adversarial Networks (SAN) to improve the efficiency and fidelity of speech synthesis, building on the foundation of HiFi-GAN. This approach integrates SAN into the discriminator, drawing inspiration from the paper BigVSAN, based on the work of BigVGAN.

Key Changes

Integration of SANConv2d layers into the discriminator.
Refactored DiscriminatorP_SAN to employ SAN, allowing improved parameter scaling and normalization during training.
Updated MultiPeriodDiscriminator to include both SAN-based and traditional GAN-based discriminators for robust adversarial training.

Usage

Follow the HiFi-GAN instructions to train the HiFi-SAN model.

Acknowledgements

This work builds on the foundations of previous projects:

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
LJSpeech-1.1		LJSpeech-1.1
configs		configs
logs		logs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.py		env.py
inference.py		inference.py
inference_e2e.py		inference_e2e.py
meldataset.py		meldataset.py
models.py		models.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HiFi-SAN: Enhancing Speech Synthesis with Slicing Adversarial Networks

Key Changes

Usage

Acknowledgements

About

Languages

License

blaisewf/HiFi-SAN

Folders and files

Latest commit

History

Repository files navigation

HiFi-SAN: Enhancing Speech Synthesis with Slicing Adversarial Networks

Key Changes

Usage

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages