Huaiyuan Xu . Junliang Chen . Shiyu Meng . Yi Wang . Lap-Pui Chau*
This work focuses on 3D dense perception in autonomous driving, encompassing LiDAR-Centric Occupancy Perception, Vision-Centric Occupancy Perception, and Multi-Modal Occupancy Perception. Information fusion techniques for this field are discussed. We believe this will be the most comprehensive survey to date on 3D Occupancy Perception. Please stay tuned!😉😉😉Continually updated.
✨You are welcome to provide us your work with a topic related to 3D occupancy for autonomous driving (involving not only perception, but also applications)!!!
If you discover any missing work or have any suggestions, please feel free to submit a pull request or contact us. We will promptly add the missing papers to this repository.
[1] A systematically survey for the latest research on 3D occupancy perception in the field of autonomous driving.
[2] The survey provides the taxonomy of 3D occupancy perception, and elaborate on core methodological issues, including network pipelines, multi-source information fusion, and effective network training.
[3] The survey presents evaluations for 3D occupancy perception, and offers detailed performance comparisons. Furthermore, current limitations and future research directions are discussed.
3D occupancy perception technology aims to observe and understand dense 3D environments for autonomous vehicles. Owing to its comprehensive perception capability, this technology is emerging as a trend in autonomous driving perception systems, and is attracting significant attention from both industry and academia. Similar to traditional bird’s-eye view (BEV) perception, 3D occupancy perception has the nature of multi-source input and the necessity for information fusion. However, the difference is that it captures vertical structures that are ignored by 2D BEV. In this survey, we review the most recent works on 3D occupancy perception, and provide in-depth analyses of methodologies with various input modalities. Specifically, we summarize general network pipelines, highlight information fusion techniques, and discuss effective network training. We evaluate and analyze the occupancy perception performance of the state-of-the-art on the most popular datasets. Furthermore, challenges and future research directions are discussed. We hope this report will inspire the community and encourage more research work on 3D occupancy perception.
- Introduction
- Summary of Contents
- Methods: A Survey
- 3D Occupancy Datasets
- Occupancy-based Applications
- Cite The Survey
- Contact
Year | Venue | Paper Title | Link |
---|---|---|---|
2024 | AAAI | Semantic Complete Scene Forecasting from a 4D Dynamic Point Cloud Sequence | Project Page |
2023 | T-IV | Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders | Code |
2023 | arXiv | PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction | Code |
2023 | arXiv | LiDAR-based 4D Occupancy Completion and Forecasting | Project Page |
2021 | T-PAMI | Semantic Scene Completion using Local Deep Implicit Functions on LiDAR Data | - |
2021 | AAAI | Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion | Code |
2020 | CoRL | S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds | - |
2020 | 3DV | LMSCNet: Lightweight Multiscale 3D Semantic Completion | Code |
Year | Venue | Paper Title | Link |
---|---|---|---|
2024 | IJCAI | Bridging Stereo Geometry and BEV Representation with Reliable Mutual Interaction for Semantic Scene Completion | Code |
2024 | ICRA | RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision | Code |
2024 | ICRA | MonoOcc: Digging into Monocular Semantic Occupancy Prediction | Code |
2024 | ICRA | FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird’s-Eye View and Perspective View | - |
2024 | CVPR | Symphonize 3D Semantic Scene Completion with Contextual Instance Queries | Code |
2024 | CVPR | SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction | Project Page |
2024 | CVPR | SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction | Project Page |
2024 | CVPR | PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation | Code |
2024 | CVPR | Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation | Code |
2024 | CVPR | COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction | Code |
2024 | CVPR | Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles | Project Page |
2024 | CVPR | Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications | Code |
2024 | CVPR | Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation | Project Page |
2024 | CVPR | DriveWorld: 4D Pre-trained Scene Understanding viaWorld Models for Autonomous Driving | - |
2024 | AAAI | Regulating Intermediate 3D Features for Vision-Centric Autonomous Driving | Code |
2024 | AAAI | One at a Time: Progressive Multi-step Volumetric Probability Learning for Reliable 3D Scene Perception | - |
2024 | RA-L | Multi-Camera Unified Pre-Training via 3D Scene Reconstruction | Code |
2024 | arXiv | OccFlowNet: Towards Self-supervised Occupancy Estimation via Differentiable Rendering and Occupancy Flow | - |
2024 | arXiv | OccFiner: Offboard Occupancy Refinement with Hybrid Propagation | - |
2024 | arXiv | InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction | Code |
2024 | arXiv | Unified Spatio-Temporal Tri-Perspective View Representation for 3D Semantic Occupancy Prediction | Project Page |
2024 | arXiv | ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers | - |
2023 | T-IV | 3DOPFormer: 3D Occupancy Perception from Multi-Camera Images with Directional and Distance Enhancement | Code |
2023 | NeurIPS | POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images | Project Page |
2023 | NeurIPS | Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving | Project Page |
2023 | ICCV | SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving | Project Page |
2023 | ICCV | Scene as Occupancy | Code |
2023 | ICCV | OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction | Code |
2023 | ICCV | NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space | Code |
2023 | CVPR | VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion | Code |
2023 | CVPR | Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction | Project Page |
2023 | arXiv | SSCBench: Monocular 3D Semantic Scene Completion Benchmark in Street Views | Code |
2023 | arXiv | SOccDPT: Semi-Supervised 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraints | - |
2023 | arXiv | OVO: Open-Vocabulary Occupancy | Code |
2023 | arXiv | OctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree Queries | Code |
2023 | arXiv | OccWorld: Learning a 3D OccupancyWorld Model for Autonomous Driving | Project Page |
2023 | arXiv | OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields | Project Page |
2023 | arXiv | OccDepth: A Depth-Aware Method for 3D Semantic Scene Completion | Code |
2023 | arXiv | Fully Sparse 3D Occupancy Prediction | Code |
2023 | arXiv | FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin | Code |
2023 | arXiv | FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation | Code |
2023 | arXiv | DepthSSC: Depth-Spatial Alignment and Dynamic Voxel Resolution for Monocular 3D Semantic Scene Completion | - |
2023 | arXiv | Camera-based 3D Semantic Scene Completion with Sparse Guidance Network | Code |
2023 | arXiv | A Simple Framework for 3D Occupancy Estimation in Autonomous Driving | Code |
2023 | arXiv | UniWorld: Autonomous Driving Pre-training via World Models | Code |
2022 | CVPR | MonoScene: Monocular 3D Semantic Scene Completion | Project Page |
Year | Venue | Paper Title | Code |
---|---|---|---|
2024 | arXiv | Real-time 3D semantic occupancy prediction for autonomous vehicles using memory-efficient sparse convolution | - |
2024 | arXiv | OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving | Project Page |
2024 | arXiv | OccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction | - |
2024 | arXiv | Co-Occ: Coupling Explicit Feature Fusion with Volume Rendering Regularization for Multi-Modal 3D Semantic Occupancy Prediction | Project Page |
2024 | arXiv | Unleashing HyDRa: Hybrid Fusion, Depth Consistency and Radar for Unified 3D Perception | - |
2023 | ICCV | OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception | Code |
Dataset | Year | Venue | Modality | # of Classes | Flow | Link |
---|---|---|---|---|---|---|
OpenScene | 2024 | CVPR 2024 Challenge | Camera | - | ✔️ | Intro. |
Cam4DOcc | 2024 | CVPR | Camera+LiDAR | 2 | ✔️ | Intro. |
Occ3D | 2024 | NeurIPS | Camera | 14 (Occ3D-Waymo), 16 (Occ3D-nuScenes) | ❌ | Intro. |
OpenOcc | 2023 | ICCV | Camera | 16 | ❌ | Intro. |
OpenOccupancy | 2023 | ICCV | Camera+LiDAR | 16 | ❌ | Intro. |
SurroundOcc | 2023 | ICCV | Camera | 16 | ❌ | Intro. |
OCFBench | 2023 | arXiv | LiDAR | -(OCFBench-Lyft), 17(OCFBench-Argoverse), 25(OCFBench-ApolloScape), 16(OCFBench-nuScenes) | ❌ | Intro. |
SSCBench | 2023 | arXiv | Camera | 19(SSCBench-KITTI-360), 16(SSCBench-nuScenes), 14(SSCBench-Waymo) | ❌ | Intro. |
SemanticKITT | 2019 | ICCV | Camera+LiDAR | 19(Semantic Scene Completion task) | ❌ | Intro. |
Specific Task | Year | Venue | Paper Title | Link |
---|---|---|---|---|
BEV Segmentation | 2024 | arXiv | OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks | - |
3D Panoptic Segmentation | 2024 | CVPR | PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation | Code |
Specific Task | Year | Venue | Paper Title | Link |
---|---|---|---|---|
3D Flow Prediction | 2024 | CVPR | Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications | Code |
Specific Task | Year | Venue | Paper Title | Link |
---|---|---|---|---|
3D Object Detection | 2024 | CVPR | Learning Occupancy for Monocular 3D Object Detection | Code |
3D Object Detection | 2023 | arXiv | SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection | Code |
Specific Task | Year | Venue | Paper Title | Link |
---|---|---|---|---|
Scene Generation | 2024 | CVPR | SemCity: Semantic Scene Generation with Triplane Diffusion | Code |
Specific Tasks | Year | Venue | Paper Title | Link |
---|---|---|---|---|
Occupancy Prediction, 3D Object Detection, Online Mapping, Multi-object Tracking, Motion Prediction, Motion Planning | 2024 | CVPR | DriveWorld: 4D Pre-trained Scene Understanding viaWorld Models for Autonomous Driving | - |
Occupancy Prediction, 3D Object Detection, BEV segmentation, Motion Planning | 2023 | ICCV | Scene as Occupancy | Code |
If you find our survey and repository useful for your research project, please consider citing our paper:
@misc{xu2024survey,
title={A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective},
author={Huaiyuan Xu and Junliang Chen and Shiyu Meng and Yi Wang and Lap-Pui Chau},
year={2024},
eprint={2405.05173},
archivePrefix={arXiv}
}