Stars
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
A curated list of my favourite music DSP and audio programming resources
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
Reference-aware automatic speech evaluation toolkit
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫
Full stack, modern web application template. Using FastAPI, React, SQLModel, PostgreSQL, Docker, GitHub Actions, automatic HTTPS and more.
Implementation of Differentiable Digital Signal Processing (DDSP) in Pytorch
逐行解释的pytorch自编码器实现,使用MNIST数据集进行训练,保证代码简单。
An Open Source text-to-speech system built by inverting Whisper.
Conditioning and feature fusion methods such as FiLM, Conditional Layer Norm and AdaIN.
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
A curated list of JUCE modules, templates, plugins, oh my!
Collection of tutorials & resources for the C++ library JUCE
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
Finetune MobileSAM with Less Than 4GB RAM!
An Efficient Lexical Analyzer for Chinese
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Vector (and Scalar) Quantization, in Pytorch
Extract the voice and corresponding text
Implementation of Nougat Neural Optical Understanding for Academic Documents