Stars
本项目基于SadTalkers实现视频唇形合成的Wav2lip。通过以视频文件方式进行语音驱动生成唇形,设置面部区域可配置的增强方式进行合成唇形(人脸)区域画面增强,提高生成唇形的清晰度。使用DAIN 插帧的DL算法对生成视频进行补帧,补充帧间合成唇形的动作过渡,使合成的唇形更为流畅、真实以及自然。
High-Fidelity Lip-Syncing with Wav2Lip and Real-ESRGAN
🎨ComfyUI standalone pack with 40+ custom nodes. | ComfyUI 大号整合包,预装大量自定义节点(不含SD模型)
🧊ComfyUI-3D-Pack pre-built for Windows. | Comfy3D 整合包
🐳Dockerfile for 🎨ComfyUI. | 容器镜像与启动脚本
Text-to-Music Generation with Rectified Flow Transformers
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Multilingual Voice Understanding Model
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation
High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
MagicAvatar: Multimodal Avatar Generation and Animation
[ICLR 2025] CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) …
A high-throughput and memory-efficient inference and serving engine for LLMs
Agent framework and applications built upon Qwen>=2.0, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes; NeurIPS 2024; Official code
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
PantoMatrix: Generating Face and Body Animation from Speech
Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks