zpforlove

Austin_zh zpforlove

Doctor of Philosophy in Bioscience and Biomedical Engineering

2 followers · 8 following

The Hong Kong University of Science and Technology (Guangzhou)
Guangzhou
17:39 (UTC +08:00)

Stars

mct10 / RepCodec

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Python 167 11 Updated Jul 12, 2024

hkust-nlp / simpleRL-reason

This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data

Python 2,023 153 Updated Jan 28, 2025

thuhcsi / VoxInstruct

VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling

Python 61 4 Updated Nov 9, 2024

ddlBoJack / Awesome-Speech-Language-Model

Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.

152 12 Updated Nov 10, 2024

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 8,420 644 Updated Feb 3, 2025

suno-ai / bark

🔊 Text-Prompted Generative Audio Model

Jupyter Notebook 36,843 4,340 Updated Aug 19, 2024

facebookresearch / DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 6,770 601 Updated May 31, 2024

zhenye234 / FlashSpeech

ACM MM 2024 FlashSpeech: Efficient Zero-Shot Speech Synthesis

Python 122 8 Updated Sep 20, 2024

zhenye234 / CoMoSpeech

ACM MM 2023 CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

Python 199 21 Updated Apr 26, 2024

imxtx / awesome-controllabe-speech-synthesis

This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey".

118 4 Updated Jan 14, 2025

ChenyangSi / FreeU

FreeU: Free Lunch in Diffusion U-Net (CVPR2024 Oral)

1,807 76 Updated Dec 24, 2024

TencentARC / SEED-Story

SEED-Story: Multimodal Long Story Generation with Large Language Model

Python 788 60 Updated Oct 11, 2024

KellanM / OpenAI-Api-Unreal

Integration for the OpenAI Api in Unreal Engine

C++ 707 156 Updated Aug 20, 2024

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 39,469 4,840 Updated Feb 6, 2025

HqWu-HITCS / Awesome-Chinese-LLM

整理开源的中文大语言模型，以规模较小、可私有化部署、训练成本较低的模型为主，包括底座模型，垂直领域微调及应用，数据集与教程等。

17,995 1,726 Updated Sep 19, 2024

Hannibal046 / Awesome-LLM

Awesome-LLM: a curated list of Large Language Model

21,225 1,738 Updated Feb 2, 2025

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 10,209 985 Updated Feb 6, 2025

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 39,862 4,472 Updated Jan 18, 2025

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 9,379 1,251 Updated Feb 5, 2025

fudan-generative-vision / hallo2

Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation

Python 3,462 502 Updated Jan 24, 2025

modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 7,941 825 Updated Feb 5, 2025

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,786 188 Updated Nov 14, 2024