Under construction.
2023/2/12 Initial code release.
A full demo video can be downloaded here.
-
Create conda environment with python version 3.8
-
Install pytorch and torchvision with versions specified in requirements.txt
-
Follow instructions in https://mmdetection3d.readthedocs.io/en/latest/getting_started.html#installation to install mmcv-full, mmdet, mmsegmentation and mmdet3d with versions specified in requirements.txt
-
Install timm, numba and pyyaml with versions specified in requirements.txt
-
Download pretrain weights from https://github.com/zhiqi-li/storage/releases/download/v1.0/r101_dcn_fcos3d_pretrain.pth and put it in ckpts/
-
Create soft link from data/nuscenes to your_nuscenes_path
-
Download train/val pickle files and put them in data/ nuscenes_infos_train.pkl https://cloud.tsinghua.edu.cn/f/ede3023e01874b26bead/?dl=1 nuscenes_infos_val.pkl https://cloud.tsinghua.edu.cn/f/61d839064a334630ac55/?dl=1
- Train TPVFormer for lidar segmentation task on A100 with 40G GPU memory.
bash launcher.sh config/tpv_lidarseg.py out/tpv_lidarseg
- Train TPVFormer for lidar segmentation task on 3090 with 24G GPU memory.
bash launcher.sh config/tpv_lidarseg_dim96.py out/tpv_lidarseg_dim96
- Train TPVFormer for 3D semantic occupancy prediction task on 3090 with 24G GPU memory.
bash launcher.sh config/tpv04_occupancy.py out/tpv_occupancy --lovasz-input voxel
Tesla's Occupancy Network | Our TPVFormer | |
---|---|---|
Volumetric Occupancy | Yes | Yes |
Occupancy Semantics | Yes | Yes |
#Semantics | >= 5 | 16 |
Input | 8 camera images | 6 camera images |
Training Supervision | Dense 3D reconstruction | Sparse LiDAR semantic labels |
Training Data | ~1,440,000,000 frames | 28,130 frames |
Arbitrary Resolution | Yes | Yes |
Video Context | Yes | Not yet |
Training Time | ~100,000 gpu hours | ~300 gpu hours |
Inference Time | ~10 ms on the Tesla FSD computer | ~290 ms on a single A100 |