Xiuyu Yang* · Yunze Man* · Jun-Kun Chen · Yu-Xiong Wang
[NeurIPS 2024] [Project Page
] [arXiv
] [pdf
] [BibTeX
] [License
]
If you find our work useful in your research, please consider citing our paper:
@inproceedings{yang2024scenecraft,
title={SceneCraft: Layout-Guided 3D Scene Generation},
author={Yang, Xiuyu and Man, Yunze and Chen, Jun-Kun and Wang, Yu-Xiong},
booktitle={Advances in Neural Information Processing Systems},
year={2024}
}
Clone and setup nerfstudio (better follow the version specified below).
Tips: Follow the tutorials of nerfstudio to verify the environment.
# install nerfstudio
pip install nerfstudio==0.3.4
# setup scenecraft
git --recurse-submodules clone https://github.com/OrangeSodahub/SceneCraft.git
cd SceneCraft/
pip install [-e] .
Tested environment is python3.9/3.10, torch2.0.1+cu117/118.
Tips: We host our finetuned diffusion models at (SD-Scannet++) (SD-Hypersim).
Before training, download the raw data and processed them into layout data.
- For Scannet++, download from here and complete the image distortion and downscale for dslr images;
- For Hypersim, download from here.
Tips: We host our processed layout data at (Data-Scannet++) (Data-Hypersim) which are used to train diffusion model, use it to skip following steps.
Run the following script to convert preprocessed data to layout data (check data path used in bash file):
bash scripts/prepare_dataset.sh \
${DATASET} # choose from [Scannetpp, Hypersim]
${LIMIT} # limit number of images per scene, set to 100
${GPUS} # number of gpus to use
[--split] # choose from ['train', 'val', 'all] for scannet++
[--save-depth] # store True, whether to save depth maps
[--voxel-size] # for scannet++ voxelization, set to 0.2; no use no voxelization
Generate JSONL data for efficient use of training (keep same settings as above):
bash scripts/generate_json.sh ${DATASET} ${LIMIT} [--voxel-size]
The expected well-perpared data (e.g. scannet++) structure of directory:
data
├── scannetpp
| ├── data
| | ├── SCENE_ID0
| | | ├── dslr
| ├── ... ...
├── scannetpp_processed
| ├── data # same structure as scannetpp/data/
| ├── scannetpp_instance_data
| ├── [voxel_data] # optional
| ├── semantic_data
| | ├── SCENE_ID0
| | | ├── IMAGE_ID0.png
| | | ├── IMAGE_ID0.npz
| ├── ... ...
Run the following script to train controlnets model (check model and data paths used in bash file):
bash scripts/train_controlnet_sd.sh \
${DATASET} # choose from [Scannetpp, Hypersim]
[--condition_type] # default one_hot
[--conditioning_channels] # default 8, should be less than 16
[--enable_depth_cond] # use depth condition
[--controlnet_conditioning_scale] # control factor of controlnet, e.g., 3.5 1.5
[--resume_from_checkpoint] # e.g. latest or .../checkpoint-1000
[--report_to] # e.g. wandb
# e.g.
bash scripts/train_controlnet_sd.sh hypersim --condition_type one_hot --conditioning_channels 8 --enable_depth_cond --controlnet_conditioning_scale 3.5 1.5
We use nerfacto from nerfstudio as the scene models. To generate a scene:
- get its raw data (bounding boxes, labels and cameras);
- get its layout data (semantic/depth images and jsonl file);
- train its scene model.
Step1: this step is only needed for scene layout drawn by ourselves.
Use this webgui to draw your own layout, then export the layout and camera data files to ROOT/data/custom/(scene_id)/
which should be:
data
├── custom
| ├── (scene_id)
| | ├── cameras.json
| | ├── layout.json
The interface is adapted from nerfstudio viewer, where developers could put/remove/edit objects/cameras. Click EXPORT
to save layouts/cameras to .json
files. (For those who want more customizations, source code of this interface is at thirdparty/nerfstudio/nerfstudio/viewer_legacy
)
Step2: run the following script to get layout data from raw data:
bash scripts/generate_outputs.py \
--layout # to specify the output type
--dataset # choose from ['scannetpp', 'hypersim', 'custom']
--scene_id # scene_id
--output_dir # default outputs
Step3: (More specific instructions will be provided) More training details could be found in Supp. (Sec.A) of our paper. This training step requires at least TWO GPUs (check Appendix Sec.A of our paper).
Check the configurations at scenecraft/configs/method
and run the following script:
# set RECORD to track results via wandb
# set DEBUG to log more detaild infos and for debugging
[RECORD=1] [DEBUG=1] ns-train ${method_name} [--machine.num-devices ${num_gpus}]
We will provide more details and release layout data examples/scene models soon.
- Release detailed instructions for generation and visualization
- Release layout examples
- Release training code
- Instructions for preparing data
Thansk for these excellent opensource works: nerfstudio; diffuser.