Lists (1)
Sort Name ascending (A-Z)
Stars
๐ The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
real time face swap and one-click video deepfake with only a single image
A generative speech model for daily dialogue.
Open-Sora: Democratizing Efficient Video Production for All
ใชใขใซใฟใคใ ใใคในใใงใณใธใฃใผ Realtime Voice Changer
A collaboration friendly studio for NeRFs
A framework to enable multimodal models to operate a computer.
so-vits-svc fork with realtime support, improved interface and more features.
Pythonic AI generation of images and videos
EmotiVoice ๐: a Multi-Voice and Prompt-Controlled TTS Engine
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
Build real-time multimodal AI applications ๐ค๐๏ธ๐น
Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
[ICLR 2024 Oral] Generative Gaussian Splatting for Efficient 3D Content Creation
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
AnimateDiff for AUTOMATIC1111 Stable Diffusion WebUI
Improved AnimateDiff for ComfyUI and Advanced Sampling Support
Generative models for conditional audio generation
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
An extensive node suite that enables ComfyUI to process 3D inputs (Mesh & UV Texture, etc) using cutting edge algorithms (3DGS, NeRF, etc.)
[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
The official Python API for ElevenLabs Text to Speech.
A Unified Framework for Surface Reconstruction