Stars
Whisper realtime streaming for long speech-to-text transcription and translation
Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)
A high-quality speech analysis, manipulation and synthesis system
Oboe is a C++ library that makes it easy to build high-performance audio apps on Android.
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
UT-Sarulab MOS prediction system using SSL models
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
singing voice change based on whisper, and lora for singing voice clone
The Pytorch implementation of paper: Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training
Unofficial implementation of SpecTNT in pytorch
ImageBind One Embedding Space to Bind Them All
A modern, cross-platform, multi-threaded, and general purpose filesystem and disk-usage utility that is aware of .gitignore and hidden file rules.
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)
🔊 Text-Prompted Generative Audio Model
A timeline of the latest AI models for audio generation, starting in 2023!
Easily train a good VC model with voice data <= 10 mins!
Port of OpenAI's Whisper model in C/C++
Audio generation using diffusion models, in PyTorch.
Neural network-based singing voice synthesis library for research