-
Tsinghua University,Department of Electronic Engineering
- Beijing,China
Highlights
- Pro
Stars
a high-performance, POSIX-ish Amazon S3 file system written in Go
Evaluation Protocol for Large-Scale Zero-Shot TTS Literature
Github Pages template based upon HTML and Markdown for personal, portfolio-based websites.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
The official implementation of paper SWIM: SHORT-WINDOW CNN INTEGRATED WITH MAMBA FOR EEG-BASED AUDITORY SPATIAL ATTENTION DECODING
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
Learn fast, scalable, and calibrated measures of uncertainty using neural networks!
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
Repo for counting stars and contributing. Press F to pay respect to glorious developers.
Implementation of Denoising Diffusion Probabilistic Model in Pytorch
Generative models for conditional audio generation
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Reference-aware automatic speech evaluation toolkit
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
Zero-Shot Speech Editing and Text-to-Speech in the Wild
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
An unofficial PyTorch implementation of the audio LM VALL-E
Hackable and optimized Transformers building blocks, supporting a composable construction.
Fast and memory-efficient exact attention
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
A high-throughput and memory-efficient inference and serving engine for LLMs