Audio AI
🔊 Text-Prompted Generative Audio Model
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Suno AI's Bark model in C/C++ for fast text-to-speech generation
🔊 Text-Prompted Generative Audio Model with Gradio
🚀 BARK INFINITY GUI CMD 🎶 Powered Up Bark Text-prompted Generative Audio Model
Robust Speech Recognition via Large-Scale Weak Supervision
a list of demo websites for automatic music generation research
A timeline of the latest AI models for audio generation, starting in 2023!
TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS, Stable Audio, Mars5, F5-TTS, ParlerTTS)
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Real-time transcription of audio, integrated with ChatGPT for interactive use. Save, load, and append transcripts for effective context management in conversations.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Record voice notes & transcribe, summarize, and get tasks
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
🚀 BARK INFINITY 🎶 Power Up The Bark Text-prompted Generative Audio Model
[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Instant voice cloning by MIT and MyShell.
A nearly-live implementation of OpenAI's Whisper.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Easily train a good VC model with voice data <= 10 mins!
Speak (speech-to-text) to Ollama LLMs in any lanaguage - Streamlit app
Voice Emotion Detector that detects emotion from audio speech using one dimensional CNNs (convolutional neural networks) using keras and tensorflow on Jupyter Notebook.
Fine-tune ChatGPT to write lyrics of your favorite artist