Lists (2)
Sort Name ascending (A-Z)
Stars
A comprehensive Windows-optimized tool for downloading, converting, and quantizing the Qwen2.5-7B-Instruct model to GGUF format.
Official Implementations for Paper - AniDoc: Animation Creation Made Easier
A generative world for general-purpose robotics & embodied AI learning.
Official implementation of “GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting” by Kyusun Cho, Joungbin Lee, Heeji Yoon, Yeobin Hong, Jaehoon Ko,…
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Official implementation of EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
A simple screen parsing tool towards pure vision based GUI agent
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Chat with your documents using Vision Language Models. This repo implements an End to End RAG pipeline with both local and proprietary VLMs
About High quality video and image generation by https://klingai.kuaishou.com and https://klingai.com/ Reverse engineered API.
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Official inference repo for FLUX.1 models
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.