macroustc

macroustc

0 followers · 4 following

seamless_communication Public
Forked from facebookresearch/seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook Other Updated Jan 11, 2024
phonemizer Public
Forked from bootphon/phonemizer

Simple text to phones converter for multiple languages

Python GNU General Public License v3.0 Updated Jan 11, 2024
piper Public
Forked from rhasspy/piper

A fast, local neural text to speech system

C++ MIT License Updated Dec 23, 2023
OpenVoice Public
Forked from myshell-ai/OpenVoice

Instant voice cloning

Python 1 Other Updated Dec 13, 2023
Qwen-Audio Public
Forked from QwenLM/Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python Other Updated Dec 11, 2023
Amphion Public
Forked from open-mmlab/Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python MIT License Updated Dec 10, 2023
StyleTTS2 Public
Forked from yl4579/StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python MIT License Updated Nov 27, 2023
EmotiVoice Public
Forked from netease-youdao/EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Python Apache License 2.0 Updated Nov 14, 2023
Bert-VITS2 Public
Forked from fishaudio/Bert-VITS2

vits2 backbone with bert

Python GNU Affero General Public License v3.0 Updated Oct 30, 2023
audino Public
Forked from midas-research/audino

Open source audio annotation tool for humans

JavaScript MIT License Updated Oct 25, 2023
UniAudio Public
Forked from yangdongchao/UniAudio

The Open Source Code of UniAudio

Python Updated Oct 6, 2023
NISQA Public
Forked from gabrielmittag/NISQA

NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment

Python MIT License Updated Sep 14, 2023
Speech-Resources Public
Forked from ddlBoJack/Speech-Resources

语音方向实验室/公司/资源/实习等，欢迎推荐或自荐

Updated Sep 6, 2023
LLaSM Public
Forked from LinkSoul-AI/LLaSM

第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验，同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。

Python Apache License 2.0 Updated Aug 31, 2023
VALL-E-X Public
Forked from Plachtaa/VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io

Python MIT License Updated Aug 24, 2023
AudioLDM2 Public
Forked from haoheliu/AudioLDM2

Text-to-Audio/Music Generation

Python Other Updated Aug 14, 2023
torchcrepe Public
Forked from maxrmorrison/torchcrepe

Pytorch implementation of the CREPE pitch tracker

Python MIT License Updated Jul 28, 2023
ultimatevocalremovergui Public
Forked from Anjok07/ultimatevocalremovergui

GUI for a Vocal Remover that uses Deep Neural Networks.

Python MIT License Updated Jul 22, 2023
encodec Public
Forked from facebookresearch/encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python MIT License Updated Jun 24, 2023
3D-Speaker Public
Forked from modelscope/3D-Speaker

A repository for single- and multi-modal speaker verification, speaker recognition, and speaker diarization.

Python Apache License 2.0 Updated Jun 12, 2023
audiocraft Public
Forked from facebookresearch/audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Python MIT License Updated Jun 11, 2023
pyannote-audio Public
Forked from pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook MIT License Updated Jun 2, 2023
VITS-fast-fine-tuning Public
Forked from Plachtaa/VITS-fast-fine-tuning

This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion

Python Apache License 2.0 Updated May 24, 2023
visqol Public
Forked from google/visqol

Perceptual Quality Estimator for speech and audio

C++ Apache License 2.0 Updated May 18, 2023
DPE Public
Forked from OpenTalker/DPE

[CVPR 2023] DPE: Disentanglement of Pose and Expression for General Video Portrait Editing

Python MIT License Updated Apr 30, 2023
PaddleGAN Public
Forked from PaddlePaddle/PaddleGAN

PaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, Wav2Lip, picture repair, image editing, photo2cartoon, image style transfer, GPEN, and so on.

Python Apache License 2.0 Updated Apr 29, 2023
tango Public
Forked from declare-lab/tango

Codes and Model of the paper "Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model"

Python Other Updated Apr 28, 2023
Parselmouth Public
Forked from YannickJadoul/Parselmouth

Praat in Python, the Pythonic way

C++ GNU General Public License v3.0 Updated Apr 25, 2023
w2v2-age-gender-how-to Public
Forked from audeering/w2v2-age-gender-how-to

How to use our public wav2vec2 age and gender model

Jupyter Notebook MIT License Updated Apr 25, 2023
MiniGPT-4 Public
Forked from Vision-CAIR/MiniGPT-4

MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models

Python BSD 3-Clause "New" or "Revised" License Updated Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

macroustc

Block or report macroustc

seamless_communication Public

phonemizer Public

piper Public

OpenVoice Public

Qwen-Audio Public

Amphion Public

StyleTTS2 Public

EmotiVoice Public

Bert-VITS2 Public

audino Public

UniAudio Public

NISQA Public

Speech-Resources Public

LLaSM Public

VALL-E-X Public

AudioLDM2 Public

torchcrepe Public

ultimatevocalremovergui Public

encodec Public

3D-Speaker Public

audiocraft Public

pyannote-audio Public

VITS-fast-fine-tuning Public

visqol Public

DPE Public

PaddleGAN Public

tango Public

Parselmouth Public

w2v2-age-gender-how-to Public

MiniGPT-4 Public