-
seamless_communication Public
Forked from facebookresearch/seamless_communicationFoundational Models for State-of-the-Art Speech and Text Translation
Jupyter Notebook Other UpdatedJan 11, 2024 -
phonemizer Public
Forked from bootphon/phonemizerSimple text to phones converter for multiple languages
Python GNU General Public License v3.0 UpdatedJan 11, 2024 -
piper Public
Forked from rhasspy/piperA fast, local neural text to speech system
C++ MIT License UpdatedDec 23, 2023 -
OpenVoice Public
Forked from myshell-ai/OpenVoiceInstant voice cloning
-
Qwen-Audio Public
Forked from QwenLM/Qwen-AudioThe official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Python Other UpdatedDec 11, 2023 -
Amphion Public
Forked from open-mmlab/AmphionAmphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Python MIT License UpdatedDec 10, 2023 -
StyleTTS2 Public
Forked from yl4579/StyleTTS2StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Python MIT License UpdatedNov 27, 2023 -
EmotiVoice Public
Forked from netease-youdao/EmotiVoiceEmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Python Apache License 2.0 UpdatedNov 14, 2023 -
Bert-VITS2 Public
Forked from fishaudio/Bert-VITS2vits2 backbone with bert
Python GNU Affero General Public License v3.0 UpdatedOct 30, 2023 -
audino Public
Forked from midas-research/audinoOpen source audio annotation tool for humans
JavaScript MIT License UpdatedOct 25, 2023 -
UniAudio Public
Forked from yangdongchao/UniAudioThe Open Source Code of UniAudio
Python UpdatedOct 6, 2023 -
NISQA Public
Forked from gabrielmittag/NISQANISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
Python MIT License UpdatedSep 14, 2023 -
Speech-Resources Public
Forked from ddlBoJack/Speech-Resources语音方向实验室/公司/资源/实习等,欢迎推荐或自荐
UpdatedSep 6, 2023 -
LLaSM Public
Forked from LinkSoul-AI/LLaSM第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。
Python Apache License 2.0 UpdatedAug 31, 2023 -
VALL-E-X Public
Forked from Plachtaa/VALL-E-XAn open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
Python MIT License UpdatedAug 24, 2023 -
AudioLDM2 Public
Forked from haoheliu/AudioLDM2Text-to-Audio/Music Generation
Python Other UpdatedAug 14, 2023 -
torchcrepe Public
Forked from maxrmorrison/torchcrepePytorch implementation of the CREPE pitch tracker
Python MIT License UpdatedJul 28, 2023 -
ultimatevocalremovergui Public
Forked from Anjok07/ultimatevocalremoverguiGUI for a Vocal Remover that uses Deep Neural Networks.
Python MIT License UpdatedJul 22, 2023 -
encodec Public
Forked from facebookresearch/encodecState-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Python MIT License UpdatedJun 24, 2023 -
3D-Speaker Public
Forked from modelscope/3D-SpeakerA repository for single- and multi-modal speaker verification, speaker recognition, and speaker diarization.
Python Apache License 2.0 UpdatedJun 12, 2023 -
audiocraft Public
Forked from facebookresearch/audiocraftAudiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Python MIT License UpdatedJun 11, 2023 -
pyannote-audio Public
Forked from pyannote/pyannote-audioNeural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Jupyter Notebook MIT License UpdatedJun 2, 2023 -
VITS-fast-fine-tuning Public
Forked from Plachtaa/VITS-fast-fine-tuningThis repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
Python Apache License 2.0 UpdatedMay 24, 2023 -
visqol Public
Forked from google/visqolPerceptual Quality Estimator for speech and audio
C++ Apache License 2.0 UpdatedMay 18, 2023 -
DPE Public
Forked from OpenTalker/DPE[CVPR 2023] DPE: Disentanglement of Pose and Expression for General Video Portrait Editing
Python MIT License UpdatedApr 30, 2023 -
PaddleGAN Public
Forked from PaddlePaddle/PaddleGANPaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, Wav2Lip, picture repair, image editing, photo2cartoon, image style transfer, GPEN, and so on.
Python Apache License 2.0 UpdatedApr 29, 2023 -
tango Public
Forked from declare-lab/tangoCodes and Model of the paper "Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model"
Python Other UpdatedApr 28, 2023 -
Parselmouth Public
Forked from YannickJadoul/ParselmouthPraat in Python, the Pythonic way
C++ GNU General Public License v3.0 UpdatedApr 25, 2023 -
w2v2-age-gender-how-to Public
Forked from audeering/w2v2-age-gender-how-toHow to use our public wav2vec2 age and gender model
Jupyter Notebook MIT License UpdatedApr 25, 2023 -
MiniGPT-4 Public
Forked from Vision-CAIR/MiniGPT-4MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models
Python BSD 3-Clause "New" or "Revised" License UpdatedApr 24, 2023