-
University of Science and Technology of China
- Hefei Anhui
Stars
[CVPR 2025] Align3R: Aligned Monocular Depth Estimation for Dynamic Videos
[ICLR 2024 spotlight] Official implementation of "InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior".
[ICLR 2025, Oral] EmbodiedSAM: Online Segment Any 3D Thing in Real Time
Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation
[NeurIPS 2024] Make-it-Real: Unleashing Large Multimodal Model for Painting 3D Objects with Realistic Materials
cvg / Mask3D
Forked from JonasSchult/Mask3DMask3D predicts accurate 3D semantic instances achieving state-of-the-art on ScanNet, ScanNet200, S3DIS and STPLS3D.
This is the official implementation for our paper;"LAR:Look Around and Refer".
[NeurIPS 2024] MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting
[CVPR 2024 Highlight] GenAD: Generalized Predictive Model for Autonomous Driving & Foundation Models in Autonomous System
Mitsuba 3: A Retargetable Forward and Inverse Renderer
[ECCV'24] OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
😎 Awesome lists of papers and codes about open-vocabulary perception, including both 3D and 2D
[ECCV 2024] The official code of paper "Open-Vocabulary SAM".
[WSDM 2022] The official implementation for paper: Ada-GNN: Adapting to Local Patterns for Improving Graph Neural Networks
[NeurIPS 2024] Are Your Models Still Fair? Fairness Attacks on Graph Neural Networks via Node Injections.
OpenSpaceAI / mmdepth
Forked from RuijieZhu94/mmdepthMonocular Depth Estimation Toolbox and Benchmark. [Arxiv'24 ScaleDepth, TCSVT'24 Plane2Depth, TIP'24 Binsformer]
[NeurIPS 2024] A Unified Framework for 3D Scene Understanding
A point cloud rendering tool based on Python and Mitsuba, you can use Mitsuba's rich features to render virtual point cloud scenes. Note: Mitsuba 0.5 is suitable for CPU rendering and Mitsuba 3.0 i…
(CVPR 2023) PLA: Language-Driven Open-Vocabulary 3D Scene Understanding & (CVPR2024) RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding
Code&Data for Grounded 3D-LLM with Referent Tokens
[ICLR 2025] From anything to mesh like human artists. Official impl. of "MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers"
Reasoning 3D Segmentation - "segment anything"/grounding/part seperation in 3D with natural conversations.
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
BSNet: Box-Supervised Simulation-assisted Mean Teacher for 3D Instance Segmentation (CVPR2024)
A Framework of Small-scale Large Multimodal Models
Code for 3D-LLM: Injecting the 3D World into Large Language Models
Code for "Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes"
[ECCV 2024 Best Paper Candidate] PointLLM: Empowering Large Language Models to Understand Point Clouds