Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 8,675 675 Updated Mar 3, 2025

usnistgov / SCTK

C 215 54 Updated Nov 27, 2023

jitsi / jiwer

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

Python 697 101 Updated Feb 15, 2025

Takaaki-Saeki / DiscreteSpeechMetrics

Reference-aware automatic speech evaluation toolkit

Python 144 12 Updated Dec 5, 2024

aliutkus / speechmetrics

A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR

Python 951 160 Updated Jul 5, 2023

jasonppy / VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 8,177 782 Updated Jun 24, 2024

jim-schwoebel / voice_datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

1,860 236 Updated Jun 6, 2024

EmulationAI / awesome-large-audio-models

Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.

657 38 Updated Aug 3, 2024

metame-ai / awesome-audio-plaza

Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation

374 15 Updated Mar 10, 2025

lifeiteng / vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Python 2,105 321 Updated Nov 14, 2023

e-c-k-e-r / vall-e

An unofficial PyTorch implementation of VALL-E

Python 88 7 Updated Mar 13, 2025

enhuiz / vall-e

An unofficial PyTorch implementation of the audio LM VALL-E

Python 2,988 417 Updated May 10, 2023

facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 9,170 652 Updated Mar 9, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 16,280 1,542 Updated Mar 13, 2025

facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Jupyter Notebook 21,641 2,272 Updated Mar 13, 2025

Kyubyong / g2p

g2p: English Grapheme To Phoneme Conversion

Python 841 129 Updated Jan 5, 2023

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 41,353 6,239 Updated Mar 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zhangziyang windowso

Achievements

Achievements

Highlights

Block or report windowso

Stars

kahing / goofys

keonlee9420 / evaluate-zero-shot-tts

academicpages / academicpages.github.io

FunAudioLLM / CosyVoice

windowso / SWIM-ASAD

liutaocode / TTS-arxiv-daily

aamini / evidential-deep-learning

LTH14 / mar

996icu / 996.ICU

lucidrains / denoising-diffusion-pytorch

Stability-AI / stable-audio-tools

facebookresearch / DiT

state-spaces / mamba

open-mmlab / Amphion