Highlights
- Pro
Stars
Wan: Open and Advanced Large-Scale Video Generative Models
Code for the project "MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos"
Video Generation Foundation Models: https://saiyan-world.github.io/goku/
Official implementation of Continuous 3D Perception Model with Persistent State
Janus-Series: Unified Multimodal Understanding and Generation Models
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
💪 [ARXIV 2025] Pytorch implementation of 'HAC++: Towards 100X Compression of 3D Gaussian Splatting'
Code for "Real3D: Scaling Up Large Reconstruction Models with Real-World Images"
🍳 [arXiv'24] PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting
Self-reimplemented version of Long-LRM.
Official implementation of the paper "Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content".
Code for "MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training", Arxiv 2025.
Official code for paper: F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Consistent Gaussian Splatting
[arXiv 2025] Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
Official repository for "Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders"
Code release for https://kovenyu.com/WonderWorld/
Official implementation for paper - LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
Code for MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data
A generative world for general-purpose robotics & embodied AI learning.
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
[ICLR'25] 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation
[ICLR'25] SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
Pointcept: a codebase for point cloud perception research. Latest works: PTv3 (CVPR'24 Oral), PPT (CVPR'24), OA-CNNs (CVPR'24), MSC (CVPR'23)