forked from NVIDIA/semantic-segmentation
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 3490c68
Showing
15 changed files
with
1,397 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
ckpt | ||
setenv | ||
weights | ||
*~ | ||
*.pyc | ||
dump_imgs_train | ||
tb | ||
__pycache__/ | ||
.idea/ | ||
build/ | ||
*.egg-info/ | ||
dist/ | ||
*.py[cod] | ||
*.swp | ||
*.o | ||
*.so | ||
.torch | ||
.DS_Store |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
[submodule "sdcnet/flownet2_pytorch"] | ||
path = sdcnet/flownet2_pytorch | ||
url = https://github.com/NVIDIA/flownet2-pytorch |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
Copyright (C) 2019 NVIDIA Corporation. Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao and Bryan Catanzaro. | ||
All rights reserved. | ||
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode). | ||
|
||
Permission to use, copy, modify, and distribute this software and its documentation | ||
for any non-commercial purpose is hereby granted without fee, provided that the above | ||
copyright notice appear in all copies and that both that copyright notice and this | ||
permission notice appear in supporting documentation, and that the name of the author | ||
not be used in advertising or publicity pertaining to distribution of the software | ||
without specific, written prior permission. | ||
|
||
THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL | ||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ANY PARTICULAR PURPOSE. | ||
IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL | ||
DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, | ||
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING | ||
OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
# [Improving Semantic Segmentation via Video Prediction and Label Relaxation](https://nv-adlr.github.io/publication/2018-Segmentation) | ||
|
||
![alt text](images/method.png) | ||
|
||
## Installation | ||
|
||
# Get Semantic Segmentation source code | ||
git clone https://github.com/NVIDIA/semantic-segmentation.git | ||
cd semantic-segmentation | ||
|
||
# Build Docker Image | ||
docker build -t nvidia-segmentation -f Dockerfile . | ||
|
||
Our pytorch implementation of Semantic Segmentation using Deeplabv3-Plus to achieve SOTA on cityscapes. <br /> | ||
We are working on providing detail report, please bear with us. <br /> | ||
To propose a model or change for inclusion, please submit a pull request. | ||
|
||
Multiple GPU training is supported, and the code provides examples for training or inference. <br /> | ||
For more help, type <br/> | ||
|
||
python train.py --help | ||
|
||
## Network architectures | ||
|
||
Below are the different base network architectures that are currently provided. <br /> | ||
|
||
- **WideResnet38** | ||
- **SEResnext(50)-Stride8** | ||
|
||
We have also support in our code for different model trunks but have not been tested with current repo. | ||
- **SEResnext(50)-Stride8** | ||
- **Resnet(50,101)-Stride8** | ||
- **Stride-16** | ||
|
||
## Pre-trained Models | ||
We've included pre-trained models. Download checkpoints to a folder `pretrained_models`. | ||
|
||
* [pretrained_models/cityscapes_best.pth](https://drive.google.com/file/d/1P4kPaMY-SmQ3yPJQTJ7xMGAB_Su-1zTl/view?usp=sharing)[1071MB] | ||
* [pretrained_models/camvid_best.pth](https://drive.google.com/file/d/1OzUCbFdXulB2P80Qxm7C3iNTeTP0Mvb_/view?usp=sharing)[1071MB] | ||
* [pretrained_models/kitti_best.pth"](https://drive.google.com/file/d/1OrTcqH_I3PHFiMlTTZJgBy8l_pladwtg/view?usp=sharing)[1071MB] | ||
* [pretrained_models/sdc_cityscapes_vrec.pth.tar](https://drive.google.com/file/d/1OxnJo2tFEQs3vuY01ibPFjn3cRCo2yWt/view?usp=sharing)[38MB] | ||
* [pretrained_models/FlowNet2_checkpoint.pth.tar](https://drive.google.com/file/d/1hF8vS6YeHkx3j2pfCeQqqZGwA_PJq_Da/view?usp=sharing)[620MB] | ||
|
||
|
||
## Data Loaders | ||
|
||
Dataloaders for Cityscapes, Mapillary, Camvid and Kitti are available in [datasets](./datasets). <br /> | ||
|
||
### Python requirements | ||
|
||
Currently, the code supports | ||
* Python 3 | ||
* Python Packages | ||
* numpy | ||
* PyTorch ( == 0.5.1, for <= 0.5.0 ) | ||
* numpy | ||
* sklearn | ||
* h5py | ||
* scikit-image | ||
* pillow | ||
* piexif | ||
* cffi | ||
* tqdm | ||
* dominate | ||
* tensorboardX | ||
* opencv-python | ||
* nose | ||
* ninja | ||
* An NVIDIA GPU and CUDA 9.0 or higher. Some operations only have gpu implementation. | ||
|
||
# Running the code | ||
|
||
## Training | ||
|
||
Dataloader: To run the code you will have to change the datapath location in `config.py` for your data. | ||
Model Arch: You can change the architecture name using `--arch`. | ||
|
||
`./train.sh ` | ||
|
||
## Inference | ||
|
||
Our inference code supports two path pooling and sliding based eval. The pooling based eval is faster than sliding based eval but provides slightly lower numbers. | ||
`./eval.sh <weight_file>` | ||
|
||
## Label propagation using Video Prediction | ||
``` | ||
cd ./sdcnet | ||
bash flownet2_pytorch/install.sh | ||
./_eval.sh | ||
``` | ||
|
||
## Results on Cityscapes | ||
|
||
![alt text](images/vis.png) | ||
|
||
# Training IOU | ||
|
||
Training results for WideResnet38 and SEResnext50 trained in fp16 on DGX-1 (8-GPU V100) | ||
|
||
<table class="tg"> | ||
<tr> | ||
<th class="tg-t2cw">Model Name</th> | ||
<th class="tg-t2cw">Mean IOU</th> | ||
<th class="tg-t2cw">Training Time</th> | ||
</tr> | ||
<tr> | ||
<td class="tg-rg0h">DeepWV3Plus(no sdc-aug)</td> | ||
<td class="tg-rg0h">81.4</td> | ||
<td class="tg-rg0h">~14 hrs</td> | ||
</tr> | ||
<tr> | ||
<td class="tg-rg0h">DeepSRNX50V3PlusD_m1(no sdc-aug)</td> | ||
<td class="tg-rg0h">80.0</td> | ||
<td class="tg-rg0h">~9 hrs</td> | ||
</tr> | ||
</table> | ||
|
||
## Reference | ||
|
||
If you find this implementation useful in your work, please acknowledge it appropriately and cite the paper or code accordingly: | ||
|
||
``` | ||
@InProceedings{semantic_cvpr19, | ||
author = {Yi Zhu*, Karan Sapra*, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao, Bryan Catanzaro}, | ||
title = {Improving Semantic Segmentation via Video Propagation and Label Relaxation}, | ||
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, | ||
month = {June}, | ||
year = {2019}, | ||
url = {https://nv-adlr.github.io/publication/2018-Segmentation} | ||
} | ||
* indicates equal contribution | ||
``` | ||
|
||
``` | ||
@misc{semantic-segmentation, | ||
author = {Karan Sapra, Fitsum A. Reda, Yi Zhu, Kevin Shih, Andrew Tao, Bryan Catanzaro}, | ||
title = {semantic-segmentation: improving semantic segmentation via video propagation and label relaxation}, | ||
year = {2019}, | ||
publisher = {GitHub}, | ||
journal = {GitHub repository}, | ||
howpublished = {\url{https://github.com/NVIDIA/semantic-segmentation}} | ||
} | ||
``` | ||
We encourage people to contribute to our code base and provide suggestions, point any issues, or solution using merge request, and we hope this repo is useful. | ||
|
||
## Acknowledgments | ||
|
||
Parts of the code were heavily derived from [pytorch-semantic-segmentation](https://github.com/ZijunDeng/pytorch-semantic-segmentation), [inplace-abn](https://github.com/mapillary/inplace_abn), [Pytorch](https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py), [ClementPinard/FlowNetPytorch](https://github.com/ClementPinard/FlowNetPytorch) and [Cadene](#https://github.com/Cadene/pretrained-models.pytorch) | ||
|
||
Our initial models used SyncBN from [Synchronized Batch Norm](https://github.com/zhanghang1989/PyTorch-Encoding) but since then have been ported to [Apex SyncBN](https://github.com/NVIDIA/apex) developed by Jie Jiang. | ||
|
||
We would also like to thank Ming-Yu Liu and Peter Kontschieder. | ||
|
||
## Coding Style | ||
* 4 spaces for indentation rather than tabs | ||
* 100 character line length | ||
* PEP8 formatting |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
#!/usr/bin/env bash | ||
# Run SDC2DRecon on Cityscapes dataset | ||
|
||
# Root folder of cityscapes images | ||
VAL_FILE=~/data/tmp/tinycs | ||
SDC2DREC_CHECKPOINT=../pretrained_models/sdc_cityscapes_vrec.pth.tar | ||
FLOWNET2_CHECKPOINT=../pretrained_models/FlowNet2_checkpoint.pth.tar | ||
|
||
python3 main.py \ | ||
--eval \ | ||
--sequence_length 2 \ | ||
--save ./ \ | ||
--name __evalrun \ | ||
--val_n_batches 1 \ | ||
--write_images \ | ||
--dataset FrameLoader \ | ||
--model SDCNet2DRecon \ | ||
--val_file ${VAL_FILE} \ | ||
--resume ${SDC2DREC_CHECKPOINT} \ | ||
--flownet2_checkpoint ${FLOWNET2_CHECKPOINT} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
from .frame_loader import * |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
from __future__ import division | ||
from __future__ import print_function | ||
|
||
import torch | ||
|
||
class StaticRandomCrop(object): | ||
""" | ||
Helper function for random spatial crop | ||
""" | ||
def __init__(self, size, image_shape): | ||
h, w = image_shape | ||
self.th, self.tw = size | ||
self.h1 = torch.randint(0, h - self.th + 1, (1,)).item() | ||
self.w1 = torch.randint(0, w - self.tw + 1, (1,)).item() | ||
|
||
def __call__(self, img): | ||
return img[self.h1:(self.h1 + self.th), self.w1:(self.w1 + self.tw), :] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
from __future__ import division | ||
from __future__ import print_function | ||
|
||
import os | ||
import natsort | ||
import numpy as np | ||
import cv2 | ||
|
||
|
||
import torch | ||
from torch.utils import data | ||
from datasets.dataset_utils import StaticRandomCrop | ||
|
||
class FrameLoader(data.Dataset): | ||
def __init__(self, args, root, is_training = False, transform=None): | ||
|
||
self.is_training = is_training | ||
self.transform = transform | ||
self.chsize = 3 | ||
|
||
# carry over command line arguments | ||
assert args.sequence_length > 1, 'sequence length must be > 1' | ||
self.sequence_length = args.sequence_length | ||
|
||
assert args.sample_rate > 0, 'sample rate must be > 0' | ||
self.sample_rate = args.sample_rate | ||
|
||
self.crop_size = args.crop_size | ||
self.start_index = args.start_index | ||
self.stride = args.stride | ||
|
||
assert (os.path.exists(root)) | ||
if self.is_training: | ||
self.start_index = 0 | ||
|
||
# collect, colors, motion vectors, and depth | ||
self.ref = self.collect_filelist(root) | ||
|
||
counts = [((len(el) - self.sequence_length) // (self.sample_rate)) for el in self.ref] | ||
self.total = np.sum(counts) | ||
self.cum_sum = list(np.cumsum([0] + [el for el in counts])) | ||
|
||
def collect_filelist(self, root): | ||
include_ext = [".png", ".jpg", "jpeg", ".bmp"] | ||
# collect subfolders, excluding hidden files, but following symlinks | ||
dirs = [x[0] for x in os.walk(root, followlinks=True) if not x[0].startswith('.')] | ||
|
||
# naturally sort, both dirs and individual images, while skipping hidden files | ||
dirs = natsort.natsorted(dirs) | ||
|
||
datasets = [ | ||
[os.path.join(fdir, el) for el in natsort.natsorted(os.listdir(fdir)) | ||
if os.path.isfile(os.path.join(fdir, el)) | ||
and not el.startswith('.') | ||
and any([el.endswith(ext) for ext in include_ext])] | ||
for fdir in dirs | ||
] | ||
|
||
return [el for el in datasets if el] | ||
|
||
def __len__(self): | ||
return self.total | ||
|
||
def __getitem__(self, index): | ||
# adjust index | ||
index = len(self) + index if index < 0 else index | ||
index = index + self.start_index | ||
|
||
dataset_index = np.searchsorted(self.cum_sum, index + 1) | ||
index = self.sample_rate * (index - self.cum_sum[np.maximum(0, dataset_index - 1)]) | ||
|
||
image_list = self.ref[dataset_index - 1] | ||
input_files = [ image_list[index + offset] for offset in range(self.sequence_length + 1)] | ||
|
||
# reverse image order with p=0.5 | ||
if self.is_training and torch.randint(0, 2, (1,)).item(): | ||
input_files = input_files[::-1] | ||
|
||
# images = [imageio.imread(imfile)[..., :self.chsize] for imfile in input_files] | ||
images = [cv2.imread(imfile)[..., :self.chsize] for imfile in input_files] | ||
input_shape = images[0].shape[:2] | ||
if self.is_training: | ||
cropper = StaticRandomCrop(self.crop_size, input_shape) | ||
images = map(cropper, images) | ||
|
||
# Pad images along height and width to fit them evenly into models. | ||
height, width = input_shape | ||
if (height % self.stride) != 0: | ||
padded_height = (height // self.stride + 1) * self.stride | ||
images = [ np.pad(im, ((0, padded_height - height), (0,0), (0,0)), 'reflect') for im in images] | ||
|
||
if (width % self.stride) != 0: | ||
padded_width = (width // self.stride + 1) * self.stride | ||
images = [np.pad(im, ((0, 0), (0, padded_width - width), (0, 0)), 'reflect') for im in images] | ||
|
||
input_images = [torch.from_numpy(im.transpose(2, 0, 1)).float() for im in images] | ||
|
||
output_dict = { | ||
'image': input_images, 'ishape': input_shape, 'input_files': input_files | ||
} | ||
|
||
return output_dict |
Submodule flownet2_pytorch
added at
ad8c16
Oops, something went wrong.