HiFi-SAN leverages Slicing Adversarial Networks (SAN) to improve the efficiency and fidelity of speech synthesis, building on the foundation of HiFi-GAN. This approach integrates SAN into the discriminator, drawing inspiration from the paper BigVSAN, based on the work of BigVGAN.
- Integration of SANConv2d layers into the discriminator.
- Refactored
DiscriminatorP_SAN
to employ SAN, allowing improved parameter scaling and normalization during training. - Updated
MultiPeriodDiscriminator
to include both SAN-based and traditional GAN-based discriminators for robust adversarial training.
Follow the HiFi-GAN instructions to train the HiFi-SAN model.
This work builds on the foundations of previous projects: