initial commit

zju3dv · Oct 4, 2023 · f86860d · f86860d
commit f86860d
Show file tree

Hide file tree

Showing 136 changed files with 10,857 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,16 @@
+__pycache__/
+.idea/
+.ipynb_checkpoints/
+.DS_Store
+*.py[cod]
+*.so
+*.ply
+*.orig
+*.o
+*.json
+*.pth
+*.npy
+*.ipynb
+*.png
+*.jpg
+data
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,18 @@
+//////////////////////////////////////////////////////////////////////////// 
+// Copyright 2022-2023 the 3D Vision Group at the State Key Lab of CAD&CG, 
+// Zhejiang University. All Rights Reserved. 
+// 
+// For more information see <https://github.com/zju3dv/Im4D> 
+// If you use this code, please cite the corresponding publications as 
+// listed on the above website. 
+// 
+// Permission to use, copy, modify and distribute this software and its 
+// documentation for educational, research and non-profit purposes only. 
+// Any modification based on this work must be open source and prohibited 
+// for commercial use. 
+// You must retain, in the source form of any derivative works that you 
+// distribute, all copyright, patent, trademark, and attribution notices 
+// from the source form of this work. 
+// 
+// 
+////////////////////////////////////////////////////////////////////////////
diff --git a/README.md b/README.md
@@ -0,0 +1,162 @@
+# Im4D: High-Fidelity and Real-Time Novel View Synthesis for Dynamic Scenes
+
+### [Project Page (Coming Soon)](https://zju3dv.github.io/im4d) | [Paper](https://drive.google.com/file/d/1MOixYy-TESDvcoL9Qj4V7tDvafqDmibh/view?usp=sharing)
+> [High-Fidelity and Real-Time Novel View Synthesis for Dynamic Scenes](https://drive.google.com/file/d/1MOixYy-TESDvcoL9Qj4V7tDvafqDmibh/view?usp=sharing) \
+> Haotong Lin, Sida Peng, Zhen Xu, Tao Xie, Xingyi He, Hujun Bao and Xiaowei Zhou \
+> SIGGRAPH Asia 2023 conference track
+
+![DNA-Rendering](https://github.com/haotongl/imgbed/raw/master/im4d/renbody.gif)
+
+<!-- ![ENeRF-Outdoor](https://github.com/haotongl/imgbed/raw/master/im4d/enerf.gif) -->
+
+## Installation
+
+### Set up the python environment
+<details> <summary>Tested with an Ubuntu workstation i9-12900K, 3090GPU</summary>
+
+```
+conda create -n im4d python=3.10
+conda activate im4d
+conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia # pytorch 2.0.1
+pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch 
+pip install -r requirments.txt
+```
+</details>
+
+### Set up datasets
+
+<details> <summary>0. Set up workspace</summary>
+
+The workspace is the disk directory that stores datasets, training logs, checkpoints and results. Please ensure it has enough disk space. 
+
+```
+export workspace=$PATH_TO_YOUR_WORKSPACE
+```
+</details>
+
+<details> <summary>1. Prepare ZJU-MoCap and NHR datasets.</summary>
+
+Please refer to [mlp_maps](https://github.com/zju3dv/mlp_maps/blob/master/INSTALL.md) to download ZJU-MoCap and NHR datasets.
+After downloading, place them into `$workspace/zju-mocap` and `$workspace/NHR`, respectively.
+</details>
+<details> <summary>2. [TODO] Prepare the DNA-Rendering dataset.</summary>
+
+This dataset was originally released last year, and it was called RenBody at that time. We used the RenBody dataset. We recently noticed that the name of this dataset has been changed to [DNA-Rendering](https://dna-rendering.github.io/index.html) and has been accepted by ICCV 2023. We are in communication with the authors of the dataset to check the latest data format and provide relevant parsers.
+</details>
+
+<!-- <details> <summary>3. [TODO] Prepare the dynerf dataset.</summary> -->
+<!-- </details> -->
+
+<!-- <details> <summary>4. [TODO] Prepare the ENeRF-Outdoor dataset.</summary> -->
+<!-- </details> -->
+
+### Pre-trained models
+
+Download pre-trained models from [this link](https://drive.google.com/drive/folders/1_huSP1XOG-HttZwu-JxmICrsR9YQOpkm?usp=sharing) for quick test. Place FILENAME.pth into\
+`$workspace/trained_model/SCENE/im4d/FILENAME/latest.pth`. \
+e.g., my_313.pth -> `$workspace/trained_model/my_313/im4d/my_313/latest.pth` \
+my_313_demo.pth -> `$workspace/trained_model/my_313/im4d/my_313_demo/latest.pth`.
+
+## Testing
+
+<details> <summary>1. Reproduce the quantitative results in the paper.</summary>
+
+```
+python run.py --type evaluate --cfg_file configs/exps/im4d/xx_dataset/xx_scene.yaml save_result True
+```
+
+For the NHR dataset, please firstly download [the preprocessed data](https://drive.google.com/drive/folders/1rA1gzzub6TkGIuu-LaqYwwwiJm4svK2F?usp=sharing) and place them into `$workspace/evaluation`. This evaluation setting is taken from [mlp_maps](https://zju3dv.github.io/mlp_maps/).
+Then run one more command to report the PSNR metric:
+```
+python scripts/evaluate/im4d/eval_nhr.py --gt_path $workspace/evaluation/sport_1_easymocap --output_path $workspace/result/sport_1_easymocap/im4d/sport1_release/default/step00999999/rgb_0
+```
+</details>
+
+<details> <summary>2. Accelerate the rendering speed .</summary>
+First, precompute the binary fields.
+
+```
+python run.py --type cache_grid --cfg_file configs/exps/im4d/renbody/0013_01.yaml --configs configs/components/opts/cache_grid.yaml grid_tag default
+```
+You may need to change the frames and grid_resolution to fit your scene. 
+For example, the scene in ZJU-MoCap has 300 frames and its height is z-axis:
+```
+python run.py --type cache_grid --cfg_file configs/exps/im4d/zju/my_313.yaml --configs configs/components/opts/cache_grid.yaml grid_tag default grid_resolution 128,128,256 test_dataset.frame_sample 0,300,1
+```
+
+
+Then, render images with the precomputed binary fields.
+
+```
+python run.py --type evaluate --cfg_file configs/exps/im4d/renbody/0013_01.yaml --configs configs/components/opts/fast_render.yaml grid_tag default save_result True
+```
+
+</details>
+
+
+<details> <summary>3. Render a video with the selected trajectory.</summary>
+
+
+```
+python run.py --type evaluate --cfg_file configs/exps/im4d/renbody/0013_01.yaml --configs configs/components/opts/render_path/renbody_path.yaml
+```
+We can render it with the precomputed binary fields by adding one more argument:
+
+```
+python run.py --type evaluate --cfg_file configs/exps/im4d/renbody/0013_01.yaml --configs configs/components/opts/render_path/renbody_path.yaml --configs configs/components/opts/fast_render.yaml
+```
+
+For better performance, you can use our pre-trained demo models which are trained with all camera views.
+
+```
+python run.py --type evaluate --cfg_file configs/exps/im4d/zju/my_313.yaml --configs configs/components/opts/fast_render.yaml --configs configs/components/opts/render_path/zju_path.yaml exp_name_tag demo
+```
+
+
+
+</details>
+
+## Training
+
+```
+python train_net.py --cfg_file configs/exps/im4d/xx_dataset/xx_scene.yaml
+```
+
+Training with multiple GPUs:
+```
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+export NUM_GPUS=4
+export LOG_LEVEL=WARNING # INFO, DEBUG, WARNING
+torchrun --nproc_per_node=$NUM_GPUS train_net.py --cfg_file configs/exps/im4d/xx_dataset/xx_scene.yaml --log_level $LOG_LEVEL distributed True
+```
+
+
+<!-- ## Results -->
+<!-- We will release -->
+## Running on the custom dataset
+
+<details> <summary>[TODO] 1. Custom mocap datasets.</summary>
+</details>
+
+
+## Acknowledgements
+We would like to acknowledge the following inspring prior work:
+- [IBRNet: Learning Multi-View Image-Based Rendering](https://ibrnet.github.io/) (Wang et al.)
+- [ENeRF: Efficient Neural Radiance Fields for Interactive Free-viewpoint Video](https://zju3dv.github.io/enerf) (Lin et al.)
+- [K-Planes: Explicit Radiance Fields in Space, Time, and Appearance](https://sarafridov.github.io/K-Planes/) (Fridovich-Keil et al.)
+
+Big thanks to [NeRFAcc](https://www.nerfacc.com/) (Li et al.) for their efficient implementation, which has significantly accelerated our rendering.
+
+Recently, in the course of refining our codebase, we have incorporated basic implementations of ENeRF and K-Planes. These additions, although not yet thoroughly tested and aligned with the official codes, could serve as useful resources for further exploration and development.
+## Citation
+
+If you find this code useful for your research, please use the following BibTeX entry
+
+```
+@inproceedings{lin2023im4d,
+ title={High-Fidelity and Real-Time Novel View Synthesis for Dynamic Scenes},
+ author={Lin, Haotong and Peng, Sida and Xu, Zhen and Xie, Tao and He, Xingyi and Bao, Hujun and Zhou, Xiaowei},
+ booktitle={SIGGRAPH Asia Conference Proceedings},
+ year={2023}
+}
+```
diff --git a/configs/3rdparty/deeplabv3_config/_base_/datasets/ade20k.py b/configs/3rdparty/deeplabv3_config/_base_/datasets/ade20k.py
@@ -0,0 +1,54 @@
+# dataset settings
+dataset_type = 'ADE20KDataset'
+data_root = 'data/ade/ADEChallengeData2016'
+img_norm_cfg = dict(
+ mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+crop_size = (512, 512)
+train_pipeline = [
+ dict(type='LoadImageFromFile'),
+ dict(type='LoadAnnotations', reduce_zero_label=True),
+ dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
+ dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
+ dict(type='RandomFlip', prob=0.5),
+ dict(type='PhotoMetricDistortion'),
+ dict(type='Normalize', **img_norm_cfg),
+ dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
+ dict(type='DefaultFormatBundle'),
+ dict(type='Collect', keys=['img', 'gt_semantic_seg']),
+]
+test_pipeline = [
+ dict(type='LoadImageFromFile'),
+ dict(
+ type='MultiScaleFlipAug',
+ img_scale=(2048, 512),
+ # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
+ flip=False,
+ transforms=[
+ dict(type='Resize', keep_ratio=True),
+ dict(type='RandomFlip'),
+ dict(type='Normalize', **img_norm_cfg),
+ dict(type='ImageToTensor', keys=['img']),
+ dict(type='Collect', keys=['img']),
+ ])
+]
+data = dict(
+ samples_per_gpu=4,
+ workers_per_gpu=4,
+ train=dict(
+ type=dataset_type,
+ data_root=data_root,
+ img_dir='images/training',
+ ann_dir='annotations/training',
+ pipeline=train_pipeline),
+ val=dict(
+ type=dataset_type,
+ data_root=data_root,
+ img_dir='images/validation',
+ ann_dir='annotations/validation',
+ pipeline=test_pipeline),
+ test=dict(
+ type=dataset_type,
+ data_root=data_root,
+ img_dir='images/validation',
+ ann_dir='annotations/validation',
+ pipeline=test_pipeline))
diff --git a/configs/3rdparty/deeplabv3_config/_base_/default_runtime.py b/configs/3rdparty/deeplabv3_config/_base_/default_runtime.py
@@ -0,0 +1,14 @@
+# yapf:disable
+log_config = dict(
+ interval=50,
+ hooks=[
+ dict(type='TextLoggerHook', by_epoch=False),
+ # dict(type='TensorboardLoggerHook')
+ ])
+# yapf:enable
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+load_from = None
+resume_from = None
+workflow = [('train', 1)]
+cudnn_benchmark = True
diff --git a/configs/3rdparty/deeplabv3_config/_base_/models/deeplabv3_r50-d8.py b/configs/3rdparty/deeplabv3_config/_base_/models/deeplabv3_r50-d8.py
@@ -0,0 +1,44 @@
+# model settings
+norm_cfg = dict(type='SyncBN', requires_grad=True)
+model = dict(
+ type='EncoderDecoder',
+ pretrained='open-mmlab://resnet50_v1c',
+ backbone=dict(
+ type='ResNetV1c',
+ depth=50,
+ num_stages=4,
+ out_indices=(0, 1, 2, 3),
+ dilations=(1, 1, 2, 4),
+ strides=(1, 2, 1, 1),
+ norm_cfg=norm_cfg,
+ norm_eval=False,
+ style='pytorch',
+ contract_dilation=True),
+ decode_head=dict(
+ type='ASPPHead',
+ in_channels=2048,
+ in_index=3,
+ channels=512,
+ dilations=(1, 12, 24, 36),
+ dropout_ratio=0.1,
+ num_classes=19,
+ norm_cfg=norm_cfg,
+ align_corners=False,
+ loss_decode=dict(
+ type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+ auxiliary_head=dict(
+ type='FCNHead',
+ in_channels=1024,
+ in_index=2,
+ channels=256,
+ num_convs=1,
+ concat_input=False,
+ dropout_ratio=0.1,
+ num_classes=19,
+ norm_cfg=norm_cfg,
+ align_corners=False,
+ loss_decode=dict(
+ type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
+ # model training and testing settings
+ train_cfg=dict(),
+ test_cfg=dict(mode='whole'))
diff --git a/configs/3rdparty/deeplabv3_config/_base_/schedules/schedule_160k.py b/configs/3rdparty/deeplabv3_config/_base_/schedules/schedule_160k.py
@@ -0,0 +1,9 @@
+# optimizer
+optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
+optimizer_config = dict()
+# learning policy
+lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
+# runtime settings
+runner = dict(type='IterBasedRunner', max_iters=160000)
+checkpoint_config = dict(by_epoch=False, interval=16000)
+evaluation = dict(interval=16000, metric='mIoU', pre_eval=True)
diff --git a/configs/3rdparty/deeplabv3_config/deeplabv3_r101-d8_512x512_160k_ade20k.py b/configs/3rdparty/deeplabv3_config/deeplabv3_r101-d8_512x512_160k_ade20k.py
@@ -0,0 +1,2 @@
+_base_ = './deeplabv3_r50-d8_512x512_160k_ade20k.py'
+model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101))
diff --git a/configs/3rdparty/deeplabv3_config/deeplabv3_r50-d8_512x512_160k_ade20k.py b/configs/3rdparty/deeplabv3_config/deeplabv3_r50-d8_512x512_160k_ade20k.py
@@ -0,0 +1,6 @@
+_base_ = [
+ './_base_/models/deeplabv3_r50-d8.py', './_base_/datasets/ade20k.py',
+ './_base_/default_runtime.py', './_base_/schedules/schedule_160k.py'
+]
+model = dict(
+ decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150))
diff --git a/configs/components/datasets/base_dataset.yaml b/configs/components/datasets/base_dataset.yaml
@@ -0,0 +1,32 @@
+train_dataset_module: lib.datasets.volcap.base_dataset
+test_dataset_module: lib.datasets.volcap.base_dataset
+render_path: False # whether to render path
+num_pixels: 1024 # number of pixels to sample for each image during each tarining iteration
+white_bkgd: True
+dataset_cfg: &dataset_cfg
+ data_root: 'renbody'
+ img_dir: 'images'
+ img_frame_format: '{:06d}.jpg'
+ msk_dir: 'maskes'
+ msk_frame_format: '{:06d}.jpg'
+ resize_ratio: 0.5
+ special_resize_ratio: 0.375
+ special_views: [48, 60, 1, -1] # if -1 in special_views, then special_views = np.arange(48, 60, 1) else special_views = special_views
+ crop_h_w: [900, 600]
+ input_view_sample: [0, 60, 1, -1]
+ render_view_sample: [0, 60, 1, -1]
+ test_views: [11, 25, 37, 57]
+ preload_data: True
+ imgs_per_batch: 8
+ ignore_dist_k3: False
+ bbox_type: 'RENBODY'
+ shift_pixel: False
+ near_far: [0.1, 100.]
+train_dataset:
+ <<: *dataset_cfg
+ split: 'train'
+ frame_sample: [0, 150, 1]
+test_dataset:
+ <<: *dataset_cfg
+ split: 'test'
+ frame_sample: [0, 150, 20]
diff --git a/configs/components/datasets/mvibr.yaml b/configs/components/datasets/mvibr.yaml
@@ -0,0 +1,14 @@
+train_dataset_module: lib.datasets.volcap.ibr_dataset
+test_dataset_module: lib.datasets.volcap.ibr_dataset
+dataset_cfg: &dataset_cfg
+ train_input_views: [2, 3, 4, 5]
+ train_input_views_prob: [0.1, 0.35, 0.45, 0.1]
+ test_input_views: 4
+ crop_srcinps: True
+ crop_padding: 5
+ crop_align: 16
+ imgs_per_batch: 1
+train_dataset:
+ <<: *dataset_cfg
+test_dataset:
+ <<: *dataset_cfg