Stars
《动手学深度学习》:面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
A sound cloning tool with a web interface, using your voice or any sound to record audio / 一个带web界面的声音克隆工具,使用你的音色或任意声音来录制音频
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
A frida tool to dump dex in memory to support security engineers analyzing malware.
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
Stable diffusion for real-time music generation
A python package to analyze and compare voices with deep learning
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
Command line utility for forced alignment using Kaldi
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
zero-shot voice conversion & singing voice conversion, with real-time support
The Implementation of FastSpeech based on pytorch.
Speech, Language, Audio, Music Processing with Large Language Model
Audio Dataset for training CLAP and other models
Chinese Mandarin Grapheme-to-Phoneme Converter. 中文轉注音或拼音 (INTERSPEECH 2022)
Official Pytorch Implementation for "DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion" (AAAI 2024)
Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".
[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
Some script for helping using Montreal Forced Aligner, maily for transforming Hanzi character to pinyin and extrat pause time from .textgrid files.