Skip to content
forked from jik876/hifi-gan

HiFi-SAN: Slicing Adversarial Networks for Efficient and High Fidelity Speech Synthesis

License

Notifications You must be signed in to change notification settings

blaisewf/HiFi-SAN

 
 

Repository files navigation

HiFi-SAN: Enhancing Speech Synthesis with Slicing Adversarial Networks

HiFi-SAN leverages Slicing Adversarial Networks (SAN) to improve the efficiency and fidelity of speech synthesis, building on the foundation of HiFi-GAN. This approach integrates SAN into the discriminator, drawing inspiration from the paper BigVSAN, based on the work of BigVGAN.

Key Changes

  1. Integration of SANConv2d layers into the discriminator.
  2. Refactored DiscriminatorP_SAN to employ SAN, allowing improved parameter scaling and normalization during training.
  3. Updated MultiPeriodDiscriminator to include both SAN-based and traditional GAN-based discriminators for robust adversarial training.

Usage

Follow the HiFi-GAN instructions to train the HiFi-SAN model.

Acknowledgements

This work builds on the foundations of previous projects:

Languages

  • Python 100.0%