Stars
[CVPR 2024] GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
Vision-and-Language Navigation in Continuous Environments using Habitat
SpatialLM: Large Language Model for Spatial Understanding
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
[ECCV 2024] Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
[ECCV 2024] GenAD: Generative End-to-End Autonomous Driving
OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving
official code of *DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model*
[ECCV 2024] 3D World Model for Autonomous Driving
Align Anything: Training All-modality Model with Feedback
Official code and checkpoint release for mobile robot foundation models: GNM, ViNT, and NoMaD.
[CVPR 2025] CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
ViPlanner: Visual Semantic Imperative Learning for Local Navigation
[CVPR'25] MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
[arXiv 2025] Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
open-sourced video dataset with dynamic scenes and camera movements annotation
official repo of paper for "CamI2V: Camera-Controlled Image-to-Video Diffusion Model"
The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
[ICLR24] Official implementation of the paper “MagicDrive: Street View Generation with Diverse 3D Geometry Control”
Unified framework for robot learning built on NVIDIA Isaac Sim
Janus-Series: Unified Multimodal Understanding and Generation Models
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
[CVPR 2024] Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis