Continuous 3D Perception Model with Persistent State

Official implementation of Continuous 3D Perception Model with Persistent State

QianqianWang*, Yifei Zhang*, Aleksander Holynski, Alexei A Efros, Angjoo Kanazawa

(*: equal contribution)

TODO

Release multi-view stereo results of DL3DV dataset.
Online demo integrated with WebCam

Getting Started

Installation

Clone CUT3R.

git clone https://github.com/CUT3R/CUT3R.git
cd CUT3R

Create the environment.

conda create -n cut3r python=3.11 cmake=3.14.0
conda activate cut3r
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia  # use the correct version of cuda for your system
pip install -r requirements.txt
# issues with pytorch dataloader, see https://github.com/pytorch/pytorch/issues/99625
conda install 'llvm-openmp<16'
# for training logging
pip install git+https://github.com/nerfstudio-project/gsplat.git
# for evaluation
pip install evo
pip install open3d

Compile the cuda kernels for RoPE (as in CroCo v2).

cd src/croco/models/curope/
python setup.py build_ext --inplace
cd ../../../../

Download Checkpoints

We currently provide checkpoints on Google Drive:

Modelname	Training resolutions	#Views	Head
`cut3r_224_linear_4.pth`	224x224	16	Linear
`cut3r_512_dpt_4_64.pth`	512x384, 512x336, 512x288, 512x256, 512x160, 384x512, 336x512, 288x512, 256x512, 160x512	4-64	DPT

cut3r_224_linear_4.pth is our intermediate checkpoint and cut3r_512_dpt_4_64.pth is our final checkpoint.

To download the weights, run the following commands:

cd src
# for 224 linear ckpt
gdown --fuzzy https://drive.google.com/file/d/11dAgFkWHpaOHsR6iuitlB_v4NFFBrWjy/view?usp=drive_link 
# for 512 dpt ckpt
gdown --fuzzy https://drive.google.com/file/d/1Asz-ZB3FfpzZYwunhQvNPZEUA8XUNAYD/view?usp=drive_link
cd ..

Inference

To run the inference code, you can use the following command:

# the following script will run inference offline and visualize the output with viser on port 8080
python demo.py --model_path MODEL_PATH --seq_path SEQ_PATH --size SIZE --vis_threshold VIS_THRESHOLD --output_dir OUT_DIR  # input can be a folder or a video
# Example:
#     python demo.py --model_path src/cut3r_512_dpt_4_64.pth --size 512 \
#         --seq_path examples/001 --vis_threshold 1.5 --output_dir tmp
#
#     python demo.py --model_path src/cut3r_224_linear_4.pth --size 224 \
#         --seq_path examples/001 --vis_threshold 1.5 --output_dir tmp

Output results will be saved to output_dir.

Currently, we accelerate the feedforward process by processing inputs in parallel within the encoder, which results in linear memory consumption as the number of frames increases.

Datasets

Our training data includes 32 datasets listed below. We provide processing scripts for all of them. Please download the datasets from their official sources, and refer to preprocess.md for processing scripts and more information about the datasets.

Evaluation

Datasets

Please follow MonST3R and Spann3R to prepare Sintel, Bonn, KITTI, NYU-v2, TUM-dynamics, ScanNet, 7scenes and Neural-RGBD datasets.

The datasets should be organized as follows:

data/
├── 7scenes
├── bonn
├── kitti
├── neural_rgbd
├── nyu-v2
├── scannetv2
├── sintel
└── tum

Evaluation Scripts

Please refer to the eval.md for more details.

Fine-tuning

To fine-tune the released checkpoints, you can use the two provided config files as a starting point. Note that these configs correspond to the final stage of training, where the goal is to train the model to handle long sequences. Therefore, in these configs, the encoders are frozen, and single-view datasets are removed. You may adjust the configurations as needed to suit your requirements.

# Remember to replace the dataset path to your own path
# the script has been tested on a 8xA100(80G) machine

cd src

# finetune 512 checkpoint
CUDA_LAUNCH_BLOCKING=1 NCCL_DEBUG=TRACE TORCH_DISTRIBUTED_DEBUG=DETAIL HYDRA_FULL_ERROR=1 accelerate launch --multi_gpu train.py  --config-name dpt_512_vary_4_64

# finetune 224 checkpoint
CUDA_LAUNCH_BLOCKING=1 NCCL_DEBUG=TRACE TORCH_DISTRIBUTED_DEBUG=DETAIL HYDRA_FULL_ERROR=1 accelerate launch --multi_gpu train.py  --config-name linear_224_fixed_16

Acknowledgements

Our code is based on the following awesome repositories:

We thank the authors for releasing their code!

Citation

If you find our work useful, please cite:

@article{wang2025continuous,
  title={Continuous 3D Perception Model with Persistent State},
  author={Wang, Qianqian and Zhang, Yifei and Holynski, Aleksander and Efros, Alexei A and Kanazawa, Angjoo},
  journal={arXiv preprint arXiv:2501.12387},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
config		config
datasets_preprocess		datasets_preprocess
eval		eval
examples		examples
src		src
.gitignore		.gitignore
README.md		README.md
add_ckpt_path.py		add_ckpt_path.py
demo.py		demo.py
requirements.txt		requirements.txt
viser_utils.py		viser_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Continuous 3D Perception Model with Persistent State

Table of Contents

TODO

Getting Started

Installation

Download Checkpoints

Inference

Datasets

Evaluation

Datasets

Evaluation Scripts

Fine-tuning

Acknowledgements

Citation

About

Releases

Packages

Languages

MyForking/CUT3R

Folders and files

Latest commit

History

Repository files navigation

Continuous 3D Perception Model with Persistent State

Table of Contents

TODO

Getting Started

Installation

Download Checkpoints

Inference

Datasets

Evaluation

Datasets

Evaluation Scripts

Fine-tuning

Acknowledgements

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages