[ECCV'24] SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization
This repository contains the official code of SparseCraft, that learns a linearized implicit neural shape function using multi-view stereo cues, enabling robust training and efficient reconstruction from sparse views.
- Python 3.8
- CUDA >= 11.7
- To utilize multiresolution hash encoding provided by tiny-cuda-nn, you should have least an RTX 2080Ti, see https://github.com/NVlabs/tiny-cuda-nn#requirements for more details.
- This repo is tested with PyTorch 2.1.2, CUDA 11.8 on Ubuntu 22.04 system with NVIDIA RTX 3090 GPU.
- Make sure that the cuda paths are set in your shell session.
export CUDA_HOME=/usr/local/cuda-11.7
export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
Step 1: Clone this repository:
git clone https://github.com/maeyounes/SparseCraft.git
cd SparseCraft
Step 2: Install the required packages using conda:
conda create -n sparsecraft python=3.8
conda activate sparsecraft
python -m pip install --upgrade pip
Step 3: Install PyTorch with CUDA 11.8:
pip uninstall torch torchvision functorch tinycudann
pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit
Step 4: Install additional dependencies:
conda install -c conda-forge gcc=11 gxx_linux-64=11
conda install -c conda-forge libxcrypt
pip install torch-cluster -f https://data.pyg.org/whl/torch-2.1.2+cu118.html
export LIBRARY_PATH="/usr/local/cuda-11.7/lib64/stubs:$LIBRARY_PATH"
Step 5: Install tiny-cuda-nn, set the correct CUDA architectures according to your GPU. a list of supported architectures can be found here.
TCNN_CUDA_ARCHITECTURES=86 pip install ninja git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
Step 6: Install project requirements:
pip install -r requirements.txt
export CPLUS_INCLUDE_PATH=$CONDA_PREFIX/include:$CPLUS_INCLUDE_PATH
Step 7: Install additional dependencies for DTU evaluation:
conda install -c conda-forge embree=2.17.7
conda install -c conda-forge pyembree
pip install pyglet==1.5
Step 8: Set environment variables in the conda environment:
conda env config vars set CUDA_HOME=/usr/local/cuda-11.7
conda env config vars set LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
conda env config vars set LIBRARY_PATH="/usr/local/cuda-11.7/lib64/stubs:$LIBRARY_PATH"
conda env config vars set CPLUS_INCLUDE_PATH=$CONDA_PREFIX/include:$CPLUS_INCLUDE_PATH
conda deactivate && conda activate sparsecraft
Remember to replace the path and the CUDA version with your own.
The first time you run the code, it will take some time to compile the CUDA kernels.
We provide preprocessed DTU data and results for the tasks of novel view synthesis and reconstruction. The data is available at Google Drive
Download the data and extract it to the root directory of this repository.
It contains the following directories:
sparsecraft_data
├── nvs # Novel View Synthesis task data and results
│ └── mvs_data
│ ├── scan103
│ ├── ...
│ └── results # Results for training using 3, 6, and 9 views
│ ├── 3v
│ │ ├── scan103
│ │ ├── ...
│ ├── 6v
│ │ ├── scan103
│ │ ├── ...
│ └── 9v
│ ├── scan103
│ ├── ...
└── reconstruction # Surface Reconstruction task data and results
└── mvs_data # Surface reconstruction data uses a different set of scans and views than the novel view synthesis task
├── set0
│ ├── scan105
│ ├── ...
└── set1
├── scan105
├── ...
└── results
├── set0
│ ├── scan105
│ ├── ...
└── set1
├── scan105
The DTU dataset was preprocessed as follows:
- The original data is from the NeuS Project. We use the same camera poses and intrinsics as the original data.
- To obtain MVS data, we used COLMAP initialized with the original camera poses and intrinsics.
- We provide a script that achieves this in
scripts
that you can run using the following command. Note that you will need to have COLMAP installed on your machine:
python scripts/dtu_colmap_mvs.py --scan_dir sparsecraft_data/nvs/mvs_data/scan8 --task novel_view_synthesis --n_views 3
where --scan_dir
specifies the directory of the scan that contains the images and the cameras.npz
file, --task
specifies the task: novel_view_synthesis
or surface_reconstruction
, and --n_views
specifies the number of views to be used for the novel view synthesis task.
The script will generate the MVS data in the specified directory scan_dir
.
For custom data preparation, you can have the images in a folder scene_name/images
and run the following command to obtain the camera calibration using COLMAP:
python scripts/run_colmap.py absolute_path/to/your/scene_name
then run the following script to generate the MVS data:
python scripts/custom_colmap_mvs.py --scene_dir absolute_path/to/your/scene_name
where --scene_dir
specifies the directory of the scene that contains the images and the files generated by COLMAP.
This process can take some time depending on the number of images and the resolution of the images. Also make sure to use the CUDA version of COLMAP for faster processing.
After running the script, you will find the MVS data in the scene_name/all_views/dense/fused.ply
.
You can visualize the generated point cloud using Meshlab or any other point cloud viewer.
In case COLMAP fails to generate the MVS data, or the generated point cloud is not dense enough and gives bad results, you can use more advanced MVS methods (traditional or deep learning-based methods) to generate the MVS data.
You can find some of the available MVS methods in the Awesome-MVS.
The config files of the reconstruction task for running SparseCraft on the provided scenes are located in configs/surface-reconstruction
.
You can find configurations for training on DTU with 3 in the dtu-sparse-3v.yaml
file. Note that the reconstruction task uses a different set of scans and views than the novel view synthesis task.
You can use the following command to train the model on scan55 from the DTU dataset with 3 views:
python launch.py --config configs/surface-reconstruction/dtu-sparse-3v.yaml --gpu 0 --train tag=reconstruction_dtu_scan55_3v
where --gpu
specifies the GPU to be used (GPU 0 will be used by default), and --train
specifies the tag of the experiment. Use python launch.py --help
to see all available options.
The checkpoints, and experiment outputs are saved to exp_dir/[name]/[tag]@[timestamp]
, and tensorboard logs can be found at runs_dir/[name]/[tag]@[timestamp]
.
You can change any configuration in the YAML config file by specifying arguments without --
, for example:
python launch.py --config configs/surface-reconstruction/dtu-sparse-3v.yaml --gpu 0 --train tag=reconstruction_dtu_scan55_3v seed=0 trainer.max_steps=50000
The training procedure is followed by testing on the input images that you can just ignore, and exports the geometry as triangular meshes.
To evaluate on the DTU dataset, you need to download SampleSet and Points from DTU's website. Extract the data and make sure the directory structure is as follows:
SampleSet
├── MVS Data
└── Points
Similar to previous works, we first clean the raw mesh with object masks by running:
python evaluation/clean_dtu_mesh.py --dtu_input_dir ./sparsecraft_data/reconstruction/mvs_data/set0/scan55 --mesh_dir exp_dir/reconstruction-3v-scan55/reconstruction_dtu_scan55_3v@[timestamp]/save
Then, run the evaluation script:
python evaluation/eval_dtu_python.py --dataset_dir PATH_to_SampleSet --mesh_dir ./exp_dir/reconstruction-3v-scan55/reconstruction_dtu_scan55_3v@[timestamp]/save/ --scan 55
The script computes the F-score, completeness, and accuracy metrics based on Chamfer distances between the generated mesh and the ground truth.
The results are saved in the exp_dir/[name]/[tag]@[timestamp]/save
directory.
Please note that results can be slightly different across different runs due to the randomness in the training process.
This is mainly due to the fact that the hash encoding library tiny-cuda-nn
used in the codebase is not deterministic.
More details about this issue can be found in the tiny-cuda-nn repository here.
The config files of the novel view synthesis task for running SparseCraft on the provided scenes are located in configs/novel-view-synthesis
.
You can find configurations for training on DTU with 3, 6 and 9 views in the dtu-sparse-3v.yaml
, dtu-sparse-6v.yaml
, and dtu-sparse-9v.yaml
files, respectively.
Use the following command to train the model on scan55 from the DTU dataset with 3 views:
python launch.py --config configs/novel-view-synthesis/dtu-sparse-3v.yaml --gpu 0 --train tag=nvs_dtu_scan55_3v
The training procedure is followed by testing, which computes metrics on the provided test data, and exports the geometry as triangular meshes.
If you want to do only testing, given that you have a trained model, you can use the following command:
python launch.py --config path/to/your/exp/config/parsed.yaml --train --resume path/to/your/exp/ckpt/epoch=0-step=20000.ckpt --gpu 0
You can find configurations for training on custom data in the configs/surface-reconstruction/custom-sparse.yaml
file.
- Adapt the
root_dir
andimg_wh
(orimg_downscale
) option in the config file to your data; - The scene is normalized so that cameras have a minimum distance
1.0
to the center of the scene. Settingmodel.radius=1.0
works in most cases. If not, try setting a smaller/larger radius that wraps tightly to your foreground object. - There are three choices to determine the scene center:
dataset.center_est_method=camera
uses the center of all camera positions as the scene center;dataset.center_est_method=lookat
assumes the cameras are looking at the same point and calculates an approximate look-at point as the scene center;dataset.center_est_method=point
uses the center of all points (reconstructed by COLMAP) that are bounded by cameras as the scene center. Please choose an appropriate method according to your capture.
- Depending on your hardware, you may need to adjust the
model.num_samples_per_ray
andmodel.max_train_num_rays
options in the config file to fit your GPU memory and to for a trade-off between performance and training time: a larger number of samples per ray and rays will improve the reconstruction quality but will require more memory and time. - The
trainer.max_steps
option in the config file specifies the number of training steps. You can adjust this value according to the number of input images and the complexity of the scene. Checkout the differences between config files for the novel view synthesis task for a better understanding of the parameters. - You will also have to tune other hyperparameters for the hash encoding depending on the complexity of the scene and the number of input images. To regularize the geometry by progressively increasing the capacity of the hash encoding, you have to tune the hash encoding parameters
start_level
andupdate_steps
. We show in the paper that for the few shot setting, it is beneficial to haveupdate_steps
it set such that the model is at full capacity at 80% of the training time. - Keep in mind that because the capacity of the hash encoding is limited at the beginning of the training, it is important to regularize the specular component of the color by setting
lambda_specular_color
to a non-zero value. This will help the model to focus on the geometry at the beginning of the training. - Depending on the density of the obtained MVS point cloud, you might need to adjust the sampling parameters used for Taylor based regularizations:
sampling_taylor_step
,sampling_taylor_sigma
andsampling_taylor_n
.
You can use the following command to train the model on a custom scene:
python launch.py --config configs/surface-reconstruction/custom-sparse.yaml --gpu 0 --train tag=reconstruction_custom_scene
Coming soon...
This work is mainly adapted from the repository Instant Neural Surface Reconstruction, with some code snippets taken from NeuS, Instant-angelo, FSGS, LLFF, and RegNerf.
Many thanks to all the authors for their great work and contributions.
If you find this codebase useful, please consider citing:
@article{younes2024sparsecraft,
title={SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization},
author={Mae Younes and Amine Ouasfi and Adnane Boukhayma},
year={2024},
url={https://arxiv.org/abs/2407.14257}
}