ZD-ai4x

💭

I may be slow to respond.

Dian Zhao ZD-ai4x

💭

I may be slow to respond.

1 follower · 3 following

ai4x

Stars

Avatar

53 repositories

Kedreamix / Linly-Talker

Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction…

Python 2,317 383 Updated Jan 8, 2025

yerfor / MimicTalk

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes; NeurIPS 2024; Official code

Python 552 67 Updated Oct 16, 2024

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 10,554 1,027 Updated Feb 11, 2025

FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model

Python 4,414 390 Updated Jan 8, 2025

edwko / OuteTTS

Interface for OuteTTS models.

Python 920 79 Updated Feb 14, 2025

usefulsensors / moonshine

Fast and accurate automatic speech recognition (ASR) for edge devices

Python 2,557 130 Updated Feb 4, 2025

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 9,613 1,290 Updated Feb 14, 2025

revdotcom / reverb

Open source inference code for Rev's model

Python 372 25 Updated Jan 17, 2025

lovemefan / SenseVoice-python

SenseVoice-python: A enterprise-grade open source multi-language asr system from funasr opensource with onnxruntime

Python 83 12 Updated Sep 24, 2024

Open-LLM-VTuber / Open-LLM-VTuber

Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking face running locally across platforms

Python 2,233 219 Updated Feb 14, 2025

Henry-23 / VideoChat

实时语音交互数字人，支持端到端语音方案（GLM-4-Voice - THG）和级联方案（ASR-LLM-TTS-THG）。可自定义形象与音色，无须训练，支持音色克隆，首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and …

Python 673 96 Updated Nov 15, 2024

jdh-algo / JoyVASA

Diffusion-based Portrait and Animal Animation

Python 653 57 Updated Jan 13, 2025

antgroup / echomimic_v2

EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Python 2,703 317 Updated Jan 27, 2025

Rudrabha / Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs

Python 11,334 2,381 Updated Feb 10, 2025

Zz-ww / SadTalker-Video-Lip-Sync

本项目基于SadTalkers实现视频唇形合成的Wav2lip。通过以视频文件方式进行语音驱动生成唇形，设置面部区域可配置的增强方式进行合成唇形（人脸）区域画面增强，提高生成唇形的清晰度。使用DAIN 插帧的DL算法对生成视频进行补帧，补充帧间合成唇形的动作过渡，使合成的唇形更为流畅、真实以及自然。

Python 1,928 333 Updated Jun 4, 2023

lipku / LiveTalking

Real time interactive streaming digital human

Python 4,530 662 Updated Feb 7, 2025

anliyuan / Ultralight-Digital-Human

一个超轻量级、可以在移动端实时运行的数字人模型

Python 1,510 222 Updated Nov 13, 2024

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 40,449 4,520 Updated Feb 14, 2025

CyberAgentAILab / TANGO

[LCLR 2025 Oral] TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation

Python 895 112 Updated Oct 29, 2024

ricky0123 / vad

Voice activity detector (VAD) for the browser with a simple API

TypeScript 1,080 167 Updated Jan 19, 2025

xszyou / Fay

Fay is an open-source digital human framework integrating language models and digital characters. It offers retail, assistant, and agent versions for diverse applications like virtual shopping guid…

JavaScript 9,898 1,872 Updated Feb 13, 2025

TMElyralab / MuseTalk

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting

Python 3,439 435 Updated Nov 27, 2024

OpenTalker / SadTalker

[CVPR 2023] SadTalker：Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Python 12,314 2,291 Updated Jun 26, 2024

RapidAI / RapidASR

📣 商用级开源语音自动识别程序库，开箱即用，全平台支持，中英文混合识别。A Cross-platform implementation of ASR inference. It's based on ONNXRuntime and FunASR. We provide a set of easier APIs to call ASR models.

C++ 525 62 Updated May 15, 2024

modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 8,093 838 Updated Feb 13, 2025

kleinlee / DH_live

每个人都能用的数字人

Python 984 207 Updated Feb 9, 2025

wan-h / awesome-digital-human-live2d

Awesome Digital Human

TypeScript 1,166 120 Updated Jan 8, 2025

Zejun-Yang / AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Python 4,830 600 Updated Jul 2, 2024

yerfor / GeneFacePlusPlus

GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code

Python 1,642 240 Updated Oct 18, 2024

Holasyb918 / PersonaTalk_Hack

PersonaTalk Hack

Python 13 Updated Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly