Starred repositories
A latent text-to-image diffusion model
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
π Text-Prompted Generative Audio Model
Google Research
π€ π¬ Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
π¦ LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
Inpaint anything using Segment Anything and inpainting models.
Tacotron 2 - PyTorch implementation with faster-than-realtime inference
The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms
Third-party audio effects plugins as differentiable layers within deep neural networks.
AdaSpeech: Adaptive Text to Speech for Custom Voice