Stars
Official repository of ’Visual-RFT: Visual Reinforcement Fine-Tuning’
Official implementation of Inductive Moment Matching
[NeurIPS'24 Spotlight] Observational Scaling Laws
Enjoy the magic of Diffusion models!
Wan: Open and Advanced Large-Scale Video Generative Models
[ICLR 2025] Reconstructive Visual Instruction Tuning
MoBA: Mixture of Block Attention for Long-Context LLMs
Official PyTorch implementation for "Large Language Diffusion Models"
Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
Official Jax Implementation of MD4 Masked Diffusion Models
[CVPR23] A cascaded diffusion captioning model with a novel semantic-conditional diffusion process that upgrades conventional diffusion model with additional semantic prior.
A curated list for awesome discrete diffusion models resources.
[ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Simple and Effective Masked Diffusion Language Model
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
Data and code for NeurIPS 2021 Paper "IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning".
Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
Code for "Differentiable Robot Rendering" (CoRL 2024)
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
A high-throughput and memory-efficient inference and serving engine for LLMs
Janus-Series: Unified Multimodal Understanding and Generation Models