Stars
Code for "StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation", Arxiv 2025.
A generative world for general-purpose robotics & embodied AI learning.
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale
Feature splatting based on INRIA GS rasterizer
[ICCV 2023, Official Code] for paper "Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives". Official Weights and Demos provided.
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation 🔥
A collection of resources on controllable generation with text-to-image diffusion models.
Learning Continuous Image Representation with Local Implicit Image Function, in CVPR 2021 (Oral)
SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Taming Transformers for High-Resolution Image Synthesis
[NeurIPS 2024] VideoTetris: Towards Compositional Text-To-Video Generation
[CVPR 2024] Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
[IJCV2024] Exploiting Diffusion Prior for Real-World Image Super-Resolution
[CVPR 2024] SceneWiz3D: Towards Text-guided 3D Scene Composition
[CVPR 2024] BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation
LAVIS - A One-stop Library for Language-Vision Intelligence
[ECCV 2024 Oral] LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
[CVPR 2024 Highlight] Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
OpenXRLab Structure-from-Motion Toolbox and Benchmark
OpenXRLab Visual Localization Toolbox and Server