Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Agent Laboratory is an end-to-end autonomous research workflow meant to assist you as the human researcher toward implementing your research ideas
VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
High-resolution models for human tasks.
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
Official code for "RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control"
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
[Arxiv 2024] Official code for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek3, ...) and 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, Inter…
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets
A high-throughput and memory-efficient inference and serving engine for LLMs
DeepSeek-VL: Towards Real-World Vision-Language Understanding
[WIP] Layer Diffusion for WebUI (via Forge)
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Transparent Image Layer Diffusion using Latent Transparency
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Official Code for MotionCtrl [SIGGRAPH 2024]
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Code for the paper "Pix2Video: Video Editing using Image Diffusion"
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
A feature-rich command-line audio/video downloader
Official and maintained implementation of the paper "Differentiable JPEG: The Devil is in the Details" [WACV 2024].
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.