Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
haotongl committed Oct 4, 2023
0 parents commit f86860d
Show file tree
Hide file tree
Showing 136 changed files with 10,857 additions and 0 deletions.
16 changes: 16 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
__pycache__/
.idea/
.ipynb_checkpoints/
.DS_Store
*.py[cod]
*.so
*.ply
*.orig
*.o
*.json
*.pth
*.npy
*.ipynb
*.png
*.jpg
data
18 changes: 18 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
////////////////////////////////////////////////////////////////////////////
// Copyright 2022-2023 the 3D Vision Group at the State Key Lab of CAD&CG,
// Zhejiang University. All Rights Reserved.
//
// For more information see <https://github.com/zju3dv/Im4D>
// If you use this code, please cite the corresponding publications as
// listed on the above website.
//
// Permission to use, copy, modify and distribute this software and its
// documentation for educational, research and non-profit purposes only.
// Any modification based on this work must be open source and prohibited
// for commercial use.
// You must retain, in the source form of any derivative works that you
// distribute, all copyright, patent, trademark, and attribution notices
// from the source form of this work.
//
//
////////////////////////////////////////////////////////////////////////////
162 changes: 162 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
# Im4D: High-Fidelity and Real-Time Novel View Synthesis for Dynamic Scenes

### [Project Page (Coming Soon)](https://zju3dv.github.io/im4d) | [Paper](https://drive.google.com/file/d/1MOixYy-TESDvcoL9Qj4V7tDvafqDmibh/view?usp=sharing)
> [High-Fidelity and Real-Time Novel View Synthesis for Dynamic Scenes](https://drive.google.com/file/d/1MOixYy-TESDvcoL9Qj4V7tDvafqDmibh/view?usp=sharing) \
> Haotong Lin, Sida Peng, Zhen Xu, Tao Xie, Xingyi He, Hujun Bao and Xiaowei Zhou \
> SIGGRAPH Asia 2023 conference track
![DNA-Rendering](https://github.com/haotongl/imgbed/raw/master/im4d/renbody.gif)

<!-- ![ENeRF-Outdoor](https://github.com/haotongl/imgbed/raw/master/im4d/enerf.gif) -->

## Installation

### Set up the python environment
<details> <summary>Tested with an Ubuntu workstation i9-12900K, 3090GPU</summary>

```
conda create -n im4d python=3.10
conda activate im4d
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia # pytorch 2.0.1
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
pip install -r requirments.txt
```
</details>

### Set up datasets

<details> <summary>0. Set up workspace</summary>

The workspace is the disk directory that stores datasets, training logs, checkpoints and results. Please ensure it has enough disk space.

```
export workspace=$PATH_TO_YOUR_WORKSPACE
```
</details>

<details> <summary>1. Prepare ZJU-MoCap and NHR datasets.</summary>

Please refer to [mlp_maps](https://github.com/zju3dv/mlp_maps/blob/master/INSTALL.md) to download ZJU-MoCap and NHR datasets.
After downloading, place them into `$workspace/zju-mocap` and `$workspace/NHR`, respectively.
</details>
<details> <summary>2. [TODO] Prepare the DNA-Rendering dataset.</summary>

This dataset was originally released last year, and it was called RenBody at that time. We used the RenBody dataset. We recently noticed that the name of this dataset has been changed to [DNA-Rendering](https://dna-rendering.github.io/index.html) and has been accepted by ICCV 2023. We are in communication with the authors of the dataset to check the latest data format and provide relevant parsers.
</details>

<!-- <details> <summary>3. [TODO] Prepare the dynerf dataset.</summary> -->
<!-- </details> -->

<!-- <details> <summary>4. [TODO] Prepare the ENeRF-Outdoor dataset.</summary> -->
<!-- </details> -->

### Pre-trained models

Download pre-trained models from [this link](https://drive.google.com/drive/folders/1_huSP1XOG-HttZwu-JxmICrsR9YQOpkm?usp=sharing) for quick test. Place FILENAME.pth into\
`$workspace/trained_model/SCENE/im4d/FILENAME/latest.pth`. \
e.g., my_313.pth -> `$workspace/trained_model/my_313/im4d/my_313/latest.pth` \
my_313_demo.pth -> `$workspace/trained_model/my_313/im4d/my_313_demo/latest.pth`.

## Testing

<details> <summary>1. Reproduce the quantitative results in the paper.</summary>

```
python run.py --type evaluate --cfg_file configs/exps/im4d/xx_dataset/xx_scene.yaml save_result True
```

For the NHR dataset, please firstly download [the preprocessed data](https://drive.google.com/drive/folders/1rA1gzzub6TkGIuu-LaqYwwwiJm4svK2F?usp=sharing) and place them into `$workspace/evaluation`. This evaluation setting is taken from [mlp_maps](https://zju3dv.github.io/mlp_maps/).
Then run one more command to report the PSNR metric:
```
python scripts/evaluate/im4d/eval_nhr.py --gt_path $workspace/evaluation/sport_1_easymocap --output_path $workspace/result/sport_1_easymocap/im4d/sport1_release/default/step00999999/rgb_0
```
</details>

<details> <summary>2. Accelerate the rendering speed .</summary>
First, precompute the binary fields.

```
python run.py --type cache_grid --cfg_file configs/exps/im4d/renbody/0013_01.yaml --configs configs/components/opts/cache_grid.yaml grid_tag default
```
You may need to change the frames and grid_resolution to fit your scene.
For example, the scene in ZJU-MoCap has 300 frames and its height is z-axis:
```
python run.py --type cache_grid --cfg_file configs/exps/im4d/zju/my_313.yaml --configs configs/components/opts/cache_grid.yaml grid_tag default grid_resolution 128,128,256 test_dataset.frame_sample 0,300,1
```


Then, render images with the precomputed binary fields.

```
python run.py --type evaluate --cfg_file configs/exps/im4d/renbody/0013_01.yaml --configs configs/components/opts/fast_render.yaml grid_tag default save_result True
```

</details>


<details> <summary>3. Render a video with the selected trajectory.</summary>


```
python run.py --type evaluate --cfg_file configs/exps/im4d/renbody/0013_01.yaml --configs configs/components/opts/render_path/renbody_path.yaml
```
We can render it with the precomputed binary fields by adding one more argument:

```
python run.py --type evaluate --cfg_file configs/exps/im4d/renbody/0013_01.yaml --configs configs/components/opts/render_path/renbody_path.yaml --configs configs/components/opts/fast_render.yaml
```

For better performance, you can use our pre-trained demo models which are trained with all camera views.

```
python run.py --type evaluate --cfg_file configs/exps/im4d/zju/my_313.yaml --configs configs/components/opts/fast_render.yaml --configs configs/components/opts/render_path/zju_path.yaml exp_name_tag demo
```



</details>

## Training

```
python train_net.py --cfg_file configs/exps/im4d/xx_dataset/xx_scene.yaml
```

Training with multiple GPUs:
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
export NUM_GPUS=4
export LOG_LEVEL=WARNING # INFO, DEBUG, WARNING
torchrun --nproc_per_node=$NUM_GPUS train_net.py --cfg_file configs/exps/im4d/xx_dataset/xx_scene.yaml --log_level $LOG_LEVEL distributed True
```


<!-- ## Results -->
<!-- We will release -->
## Running on the custom dataset

<details> <summary>[TODO] 1. Custom mocap datasets.</summary>
</details>


## Acknowledgements
We would like to acknowledge the following inspring prior work:
- [IBRNet: Learning Multi-View Image-Based Rendering](https://ibrnet.github.io/) (Wang et al.)
- [ENeRF: Efficient Neural Radiance Fields for Interactive Free-viewpoint Video](https://zju3dv.github.io/enerf) (Lin et al.)
- [K-Planes: Explicit Radiance Fields in Space, Time, and Appearance](https://sarafridov.github.io/K-Planes/) (Fridovich-Keil et al.)

Big thanks to [NeRFAcc](https://www.nerfacc.com/) (Li et al.) for their efficient implementation, which has significantly accelerated our rendering.

Recently, in the course of refining our codebase, we have incorporated basic implementations of ENeRF and K-Planes. These additions, although not yet thoroughly tested and aligned with the official codes, could serve as useful resources for further exploration and development.
## Citation

If you find this code useful for your research, please use the following BibTeX entry

```
@inproceedings{lin2023im4d,
title={High-Fidelity and Real-Time Novel View Synthesis for Dynamic Scenes},
author={Lin, Haotong and Peng, Sida and Xu, Zhen and Xie, Tao and He, Xingyi and Bao, Hujun and Zhou, Xiaowei},
booktitle={SIGGRAPH Asia Conference Proceedings},
year={2023}
}
```
54 changes: 54 additions & 0 deletions configs/3rdparty/deeplabv3_config/_base_/datasets/ade20k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# dataset settings
dataset_type = 'ADE20KDataset'
data_root = 'data/ade/ADEChallengeData2016'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (512, 512)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', reduce_zero_label=True),
dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
dict(type='RandomFlip', prob=0.5),
dict(type='PhotoMetricDistortion'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_semantic_seg']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(2048, 512),
# img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data = dict(
samples_per_gpu=4,
workers_per_gpu=4,
train=dict(
type=dataset_type,
data_root=data_root,
img_dir='images/training',
ann_dir='annotations/training',
pipeline=train_pipeline),
val=dict(
type=dataset_type,
data_root=data_root,
img_dir='images/validation',
ann_dir='annotations/validation',
pipeline=test_pipeline),
test=dict(
type=dataset_type,
data_root=data_root,
img_dir='images/validation',
ann_dir='annotations/validation',
pipeline=test_pipeline))
14 changes: 14 additions & 0 deletions configs/3rdparty/deeplabv3_config/_base_/default_runtime.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# yapf:disable
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook', by_epoch=False),
# dict(type='TensorboardLoggerHook')
])
# yapf:enable
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
cudnn_benchmark = True
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# model settings
norm_cfg = dict(type='SyncBN', requires_grad=True)
model = dict(
type='EncoderDecoder',
pretrained='open-mmlab://resnet50_v1c',
backbone=dict(
type='ResNetV1c',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
dilations=(1, 1, 2, 4),
strides=(1, 2, 1, 1),
norm_cfg=norm_cfg,
norm_eval=False,
style='pytorch',
contract_dilation=True),
decode_head=dict(
type='ASPPHead',
in_channels=2048,
in_index=3,
channels=512,
dilations=(1, 12, 24, 36),
dropout_ratio=0.1,
num_classes=19,
norm_cfg=norm_cfg,
align_corners=False,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
auxiliary_head=dict(
type='FCNHead',
in_channels=1024,
in_index=2,
channels=256,
num_convs=1,
concat_input=False,
dropout_ratio=0.1,
num_classes=19,
norm_cfg=norm_cfg,
align_corners=False,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
# model training and testing settings
train_cfg=dict(),
test_cfg=dict(mode='whole'))
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# optimizer
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
optimizer_config = dict()
# learning policy
lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
# runtime settings
runner = dict(type='IterBasedRunner', max_iters=160000)
checkpoint_config = dict(by_epoch=False, interval=16000)
evaluation = dict(interval=16000, metric='mIoU', pre_eval=True)
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
_base_ = './deeplabv3_r50-d8_512x512_160k_ade20k.py'
model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101))
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
_base_ = [
'./_base_/models/deeplabv3_r50-d8.py', './_base_/datasets/ade20k.py',
'./_base_/default_runtime.py', './_base_/schedules/schedule_160k.py'
]
model = dict(
decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150))
32 changes: 32 additions & 0 deletions configs/components/datasets/base_dataset.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
train_dataset_module: lib.datasets.volcap.base_dataset
test_dataset_module: lib.datasets.volcap.base_dataset
render_path: False # whether to render path
num_pixels: 1024 # number of pixels to sample for each image during each tarining iteration
white_bkgd: True
dataset_cfg: &dataset_cfg
data_root: 'renbody'
img_dir: 'images'
img_frame_format: '{:06d}.jpg'
msk_dir: 'maskes'
msk_frame_format: '{:06d}.jpg'
resize_ratio: 0.5
special_resize_ratio: 0.375
special_views: [48, 60, 1, -1] # if -1 in special_views, then special_views = np.arange(48, 60, 1) else special_views = special_views
crop_h_w: [900, 600]
input_view_sample: [0, 60, 1, -1]
render_view_sample: [0, 60, 1, -1]
test_views: [11, 25, 37, 57]
preload_data: True
imgs_per_batch: 8
ignore_dist_k3: False
bbox_type: 'RENBODY'
shift_pixel: False
near_far: [0.1, 100.]
train_dataset:
<<: *dataset_cfg
split: 'train'
frame_sample: [0, 150, 1]
test_dataset:
<<: *dataset_cfg
split: 'test'
frame_sample: [0, 150, 20]
14 changes: 14 additions & 0 deletions configs/components/datasets/mvibr.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
train_dataset_module: lib.datasets.volcap.ibr_dataset
test_dataset_module: lib.datasets.volcap.ibr_dataset
dataset_cfg: &dataset_cfg
train_input_views: [2, 3, 4, 5]
train_input_views_prob: [0.1, 0.35, 0.45, 0.1]
test_input_views: 4
crop_srcinps: True
crop_padding: 5
crop_align: 16
imgs_per_batch: 1
train_dataset:
<<: *dataset_cfg
test_dataset:
<<: *dataset_cfg
Loading

0 comments on commit f86860d

Please sign in to comment.