open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 2,756 254 Updated Sep 25, 2024

yannqi / Draw-an-Audio-Code

Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.

38 1 Updated Sep 11, 2024

xingchensong / S3Tokenizer

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 102 9 Updated Sep 29, 2024

GitYCC / g2pW

Chinese Mandarin Grapheme-to-Phoneme Converter. 中文轉注音或拼音 (INTERSPEECH 2022)

Python 278 38 Updated Jun 16, 2024

hacksider / Deep-Live-Cam

real time face swap and one-click video deepfake with only a single image

Python 37,786 5,400 Updated Oct 6, 2024

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 5,339 551 Updated Sep 29, 2024

FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model

Python 2,844 268 Updated Sep 25, 2024

luosiallen / Diff-Foley

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Python 151 18 Updated May 29, 2024

yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader

Python 84,201 6,564 Updated Oct 7, 2024

Tele-AI / TeleSpeech-ASR

Python 476 39 Updated Jun 7, 2024

bootphon / phonemizer

Simple text to phones converter for multiple languages

Python 1,209 168 Updated Sep 26, 2024

JosephPai / Awesome-Talking-Face

📖 A curated list of resources dedicated to talking face.

1,290 109 Updated Oct 3, 2024

X-LANCE / SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Python 517 45 Updated Oct 5, 2024

2noise / ChatTTS

A generative speech model for daily dialogue.

Python 31,261 3,386 Updated Sep 21, 2024

huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Python 25,437 5,270 Updated Oct 8, 2024

jasonppy / VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 7,534 739 Updated Jun 24, 2024

hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Python 21,776 2,111 Updated Aug 9, 2024

mini-sora / minisora

MiniSora: A community aims to explore the implementation path and future development direction of Sora.

Python 1,180 149 Updated Sep 25, 2024

PKU-YuanGroup / Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 11,296 1,009 Updated Oct 6, 2024

chenzomi12 / AISystem

AISystem 主要是指AI系统，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 10,683 1,539 Updated Sep 29, 2024

ChenHsing / Awesome-Video-Diffusion-Models

[CSUR] A Survey on Video Diffusion Models

1,734 88 Updated Oct 8, 2024

Yutong-Zhou-cv / Awesome-Text-to-Image

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

2,104 187 Updated Aug 20, 2024

facebookresearch / jepa

PyTorch code and models for V-JEPA self-supervised learning from video.

Python 2,632 251 Updated Aug 9, 2024

xianshang33 / llm-paper-daily

Daily updated LLM papers. 每日更新 LLM 相关的论文，欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个

940 35 Updated Jul 31, 2024

QwenLM / Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 4,888 373 Updated Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

macroustc

Block or report macroustc

Stars

FireRedTeam / FireRedTTS

kyutai-labs / moshi

FerdinandZhong / punctuator

xieyuankun / Codecfake

robin1001 / nn-vad

gpt-omni / mini-omni