Skip to content
View macroustc's full-sized avatar

Block or report macroustc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
  • Foundational Models for State-of-the-Art Speech and Text Translation

    Jupyter Notebook Other Updated Jan 11, 2024
  • phonemizer Public

    Forked from bootphon/phonemizer

    Simple text to phones converter for multiple languages

    Python GNU General Public License v3.0 Updated Jan 11, 2024
  • piper Public

    Forked from rhasspy/piper

    A fast, local neural text to speech system

    C++ MIT License Updated Dec 23, 2023
  • OpenVoice Public

    Forked from myshell-ai/OpenVoice

    Instant voice cloning

    Python 1 Other Updated Dec 13, 2023
  • Qwen-Audio Public

    Forked from QwenLM/Qwen-Audio

    The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

    Python Other Updated Dec 11, 2023
  • Amphion Public

    Forked from open-mmlab/Amphion

    Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

    Python MIT License Updated Dec 10, 2023
  • StyleTTS2 Public

    Forked from yl4579/StyleTTS2

    StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

    Python MIT License Updated Nov 27, 2023
  • EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

    Python Apache License 2.0 Updated Nov 14, 2023
  • Bert-VITS2 Public

    Forked from fishaudio/Bert-VITS2

    vits2 backbone with bert

    Python GNU Affero General Public License v3.0 Updated Oct 30, 2023
  • audino Public

    Forked from midas-research/audino

    Open source audio annotation tool for humans

    JavaScript MIT License Updated Oct 25, 2023
  • UniAudio Public

    Forked from yangdongchao/UniAudio

    The Open Source Code of UniAudio

    Python Updated Oct 6, 2023
  • NISQA Public

    Forked from gabrielmittag/NISQA

    NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment

    Python MIT License Updated Sep 14, 2023
  • 语音方向实验室/公司/资源/实习等,欢迎推荐或自荐

    Updated Sep 6, 2023
  • LLaSM Public

    Forked from LinkSoul-AI/LLaSM

    第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。

    Python Apache License 2.0 Updated Aug 31, 2023
  • VALL-E-X Public

    Forked from Plachtaa/VALL-E-X

    An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io

    Python MIT License Updated Aug 24, 2023
  • AudioLDM2 Public

    Forked from haoheliu/AudioLDM2

    Text-to-Audio/Music Generation

    Python Other Updated Aug 14, 2023
  • Pytorch implementation of the CREPE pitch tracker

    Python MIT License Updated Jul 28, 2023
  • GUI for a Vocal Remover that uses Deep Neural Networks.

    Python MIT License Updated Jul 22, 2023
  • State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

    Python MIT License Updated Jun 24, 2023
  • A repository for single- and multi-modal speaker verification, speaker recognition, and speaker diarization.

    Python Apache License 2.0 Updated Jun 12, 2023
  • Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

    Python MIT License Updated Jun 11, 2023
  • Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

    Jupyter Notebook MIT License Updated Jun 2, 2023
  • This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion

    Python Apache License 2.0 Updated May 24, 2023
  • visqol Public

    Forked from google/visqol

    Perceptual Quality Estimator for speech and audio

    C++ Apache License 2.0 Updated May 18, 2023
  • DPE Public

    Forked from OpenTalker/DPE

    [CVPR 2023] DPE: Disentanglement of Pose and Expression for General Video Portrait Editing

    Python MIT License Updated Apr 30, 2023
  • PaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, Wav2Lip, picture repair, image editing, photo2cartoon, image style transfer, GPEN, and so on.

    Python Apache License 2.0 Updated Apr 29, 2023
  • tango Public

    Forked from declare-lab/tango

    Codes and Model of the paper "Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model"

    Python Other Updated Apr 28, 2023
  • Praat in Python, the Pythonic way

    C++ GNU General Public License v3.0 Updated Apr 25, 2023
  • How to use our public wav2vec2 age and gender model

    Jupyter Notebook MIT License Updated Apr 25, 2023
  • MiniGPT-4 Public

    Forked from Vision-CAIR/MiniGPT-4

    MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models

    Python BSD 3-Clause "New" or "Revised" License Updated Apr 24, 2023