Starred repositories
A minimal and universal controller for FLUX.1.
A minimal and universal controller for FLUX.1.
高颜值AI数字人克隆、声音克隆、短视频生成、直播(待发布)、AI配音、AI字幕,包括Windows安装版,Web版,H5版,小程序版,副业必备,开源数字人克隆平台后端API
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.
Enjoy the magic of Diffusion models!
Code to accompany "A Method for Animating Children's Drawings of the Human Figure"
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Official repository of In-Context LoRA for Diffusion Transformers
Prompt, run, edit, and deploy full-stack web applications
Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
LLM-powered multiagent persona simulation for imagination enhancement and business insights.
The fastest digital human algorithm, now on your desktop.
“alibabacloud-nls-python-sdk提供使用阿里云智能语音服务的能力,包括语音识别、语音合成、文件转写等。”
实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and …
[真诚套壳]东半球最强的套壳数字人系统,前后端分离,可对接硅基、飞影、闪剪、壹定开放平台等所有市面上的数字人API接口,开箱即用,star交个朋友。
A powerful tool that translates ComfyUI workflows into executable Python code.
Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
📹 A more flexible CogVideoX that can generate videos at any resolution and creates videos from images.
Official implementation of the paper "TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation"
Automate browser-based workflows with LLMs and Computer Vision