Stars
The successful integration of Qwen2-VL-Instruct into the ComfyUI platform has enabled a smooth operation, supporting (but not limited to) text-based queries, video queries, single-image queries, an…
Real time interactive streaming digital human
Finetune Llama 3.3, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! 🦥
基于DeepseekR1/Gemini2/GoogleVison2/GoogleSearch API实现的深度思考联网多模态大模型。
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
This node provides lip-sync capabilities in ComfyUI using ByteDance's LatentSync model. It allows you to synchronize video lips with audio input.
可本地部署的AI语音工具箱 | A user-friendly audio toolkit for voice recognition, voice transcription, voice conversion etc.
[WIP] The all in one inference optimization solution for ComfyUI, universal, flexible, and fast.
[CVPR2025] We present StableAnimator, the first end-to-end ID-preserving video diffusion framework, which synthesizes high-quality videos without any post-processing, conditioned on a reference ima…
custom nodes use gigapixelai ai in comfyui
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
PuLID-Flux ComfyUI implementation
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).
OpenMusic: SOTA Text-to-music (TTM) Generation
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling