Skip to content

Commit

Permalink
first commit
Browse files Browse the repository at this point in the history
  • Loading branch information
fitsumreda committed Jun 14, 2019
0 parents commit 3490c68
Show file tree
Hide file tree
Showing 15 changed files with 1,397 additions and 0 deletions.
18 changes: 18 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
ckpt
setenv
weights
*~
*.pyc
dump_imgs_train
tb
__pycache__/
.idea/
build/
*.egg-info/
dist/
*.py[cod]
*.swp
*.o
*.so
.torch
.DS_Store
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "sdcnet/flownet2_pytorch"]
path = sdcnet/flownet2_pytorch
url = https://github.com/NVIDIA/flownet2-pytorch
17 changes: 17 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Copyright (C) 2019 NVIDIA Corporation. Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao and Bryan Catanzaro.
All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).

Permission to use, copy, modify, and distribute this software and its documentation
for any non-commercial purpose is hereby granted without fee, provided that the above
copyright notice appear in all copies and that both that copyright notice and this
permission notice appear in supporting documentation, and that the name of the author
not be used in advertising or publicity pertaining to distribution of the software
without specific, written prior permission.

THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ANY PARTICULAR PURPOSE.
IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL
DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING
OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
158 changes: 158 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# [Improving Semantic Segmentation via Video Prediction and Label Relaxation](https://nv-adlr.github.io/publication/2018-Segmentation)

![alt text](images/method.png)

## Installation

# Get Semantic Segmentation source code
git clone https://github.com/NVIDIA/semantic-segmentation.git
cd semantic-segmentation

# Build Docker Image
docker build -t nvidia-segmentation -f Dockerfile .

Our pytorch implementation of Semantic Segmentation using Deeplabv3-Plus to achieve SOTA on cityscapes. <br />
We are working on providing detail report, please bear with us. <br />
To propose a model or change for inclusion, please submit a pull request.

Multiple GPU training is supported, and the code provides examples for training or inference. <br />
For more help, type <br/>

python train.py --help

## Network architectures

Below are the different base network architectures that are currently provided. <br />

- **WideResnet38**
- **SEResnext(50)-Stride8**

We have also support in our code for different model trunks but have not been tested with current repo.
- **SEResnext(50)-Stride8**
- **Resnet(50,101)-Stride8**
- **Stride-16**

## Pre-trained Models
We've included pre-trained models. Download checkpoints to a folder `pretrained_models`.

* [pretrained_models/cityscapes_best.pth](https://drive.google.com/file/d/1P4kPaMY-SmQ3yPJQTJ7xMGAB_Su-1zTl/view?usp=sharing)[1071MB]
* [pretrained_models/camvid_best.pth](https://drive.google.com/file/d/1OzUCbFdXulB2P80Qxm7C3iNTeTP0Mvb_/view?usp=sharing)[1071MB]
* [pretrained_models/kitti_best.pth"](https://drive.google.com/file/d/1OrTcqH_I3PHFiMlTTZJgBy8l_pladwtg/view?usp=sharing)[1071MB]
* [pretrained_models/sdc_cityscapes_vrec.pth.tar](https://drive.google.com/file/d/1OxnJo2tFEQs3vuY01ibPFjn3cRCo2yWt/view?usp=sharing)[38MB]
* [pretrained_models/FlowNet2_checkpoint.pth.tar](https://drive.google.com/file/d/1hF8vS6YeHkx3j2pfCeQqqZGwA_PJq_Da/view?usp=sharing)[620MB]


## Data Loaders

Dataloaders for Cityscapes, Mapillary, Camvid and Kitti are available in [datasets](./datasets). <br />

### Python requirements

Currently, the code supports
* Python 3
* Python Packages
* numpy
* PyTorch ( == 0.5.1, for <= 0.5.0 )
* numpy
* sklearn
* h5py
* scikit-image
* pillow
* piexif
* cffi
* tqdm
* dominate
* tensorboardX
* opencv-python
* nose
* ninja
* An NVIDIA GPU and CUDA 9.0 or higher. Some operations only have gpu implementation.

# Running the code

## Training

Dataloader: To run the code you will have to change the datapath location in `config.py` for your data.
Model Arch: You can change the architecture name using `--arch`.

`./train.sh `

## Inference

Our inference code supports two path pooling and sliding based eval. The pooling based eval is faster than sliding based eval but provides slightly lower numbers.
`./eval.sh <weight_file>`

## Label propagation using Video Prediction
```
cd ./sdcnet
bash flownet2_pytorch/install.sh
./_eval.sh
```

## Results on Cityscapes

![alt text](images/vis.png)

# Training IOU

Training results for WideResnet38 and SEResnext50 trained in fp16 on DGX-1 (8-GPU V100)

<table class="tg">
<tr>
<th class="tg-t2cw">Model Name</th>
<th class="tg-t2cw">Mean IOU</th>
<th class="tg-t2cw">Training Time</th>
</tr>
<tr>
<td class="tg-rg0h">DeepWV3Plus(no sdc-aug)</td>
<td class="tg-rg0h">81.4</td>
<td class="tg-rg0h">~14 hrs</td>
</tr>
<tr>
<td class="tg-rg0h">DeepSRNX50V3PlusD_m1(no sdc-aug)</td>
<td class="tg-rg0h">80.0</td>
<td class="tg-rg0h">~9 hrs</td>
</tr>
</table>

## Reference

If you find this implementation useful in your work, please acknowledge it appropriately and cite the paper or code accordingly:

```
@InProceedings{semantic_cvpr19,
author = {Yi Zhu*, Karan Sapra*, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao, Bryan Catanzaro},
title = {Improving Semantic Segmentation via Video Propagation and Label Relaxation},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019},
url = {https://nv-adlr.github.io/publication/2018-Segmentation}
}
* indicates equal contribution
```

```
@misc{semantic-segmentation,
author = {Karan Sapra, Fitsum A. Reda, Yi Zhu, Kevin Shih, Andrew Tao, Bryan Catanzaro},
title = {semantic-segmentation: improving semantic segmentation via video propagation and label relaxation},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/NVIDIA/semantic-segmentation}}
}
```
We encourage people to contribute to our code base and provide suggestions, point any issues, or solution using merge request, and we hope this repo is useful.

## Acknowledgments

Parts of the code were heavily derived from [pytorch-semantic-segmentation](https://github.com/ZijunDeng/pytorch-semantic-segmentation), [inplace-abn](https://github.com/mapillary/inplace_abn), [Pytorch](https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py), [ClementPinard/FlowNetPytorch](https://github.com/ClementPinard/FlowNetPytorch) and [Cadene](#https://github.com/Cadene/pretrained-models.pytorch)

Our initial models used SyncBN from [Synchronized Batch Norm](https://github.com/zhanghang1989/PyTorch-Encoding) but since then have been ported to [Apex SyncBN](https://github.com/NVIDIA/apex) developed by Jie Jiang.

We would also like to thank Ming-Yu Liu and Peter Kontschieder.

## Coding Style
* 4 spaces for indentation rather than tabs
* 100 character line length
* PEP8 formatting
21 changes: 21 additions & 0 deletions sdcnet/_eval.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/usr/bin/env bash
# Run SDC2DRecon on Cityscapes dataset

# Root folder of cityscapes images
VAL_FILE=~/data/tmp/tinycs
SDC2DREC_CHECKPOINT=../pretrained_models/sdc_cityscapes_vrec.pth.tar
FLOWNET2_CHECKPOINT=../pretrained_models/FlowNet2_checkpoint.pth.tar

python3 main.py \
--eval \
--sequence_length 2 \
--save ./ \
--name __evalrun \
--val_n_batches 1 \
--write_images \
--dataset FrameLoader \
--model SDCNet2DRecon \
--val_file ${VAL_FILE} \
--resume ${SDC2DREC_CHECKPOINT} \
--flownet2_checkpoint ${FLOWNET2_CHECKPOINT}

1 change: 1 addition & 0 deletions sdcnet/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .frame_loader import *
17 changes: 17 additions & 0 deletions sdcnet/datasets/dataset_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from __future__ import division
from __future__ import print_function

import torch

class StaticRandomCrop(object):
"""
Helper function for random spatial crop
"""
def __init__(self, size, image_shape):
h, w = image_shape
self.th, self.tw = size
self.h1 = torch.randint(0, h - self.th + 1, (1,)).item()
self.w1 = torch.randint(0, w - self.tw + 1, (1,)).item()

def __call__(self, img):
return img[self.h1:(self.h1 + self.th), self.w1:(self.w1 + self.tw), :]
102 changes: 102 additions & 0 deletions sdcnet/datasets/frame_loader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
from __future__ import division
from __future__ import print_function

import os
import natsort
import numpy as np
import cv2


import torch
from torch.utils import data
from datasets.dataset_utils import StaticRandomCrop

class FrameLoader(data.Dataset):
def __init__(self, args, root, is_training = False, transform=None):

self.is_training = is_training
self.transform = transform
self.chsize = 3

# carry over command line arguments
assert args.sequence_length > 1, 'sequence length must be > 1'
self.sequence_length = args.sequence_length

assert args.sample_rate > 0, 'sample rate must be > 0'
self.sample_rate = args.sample_rate

self.crop_size = args.crop_size
self.start_index = args.start_index
self.stride = args.stride

assert (os.path.exists(root))
if self.is_training:
self.start_index = 0

# collect, colors, motion vectors, and depth
self.ref = self.collect_filelist(root)

counts = [((len(el) - self.sequence_length) // (self.sample_rate)) for el in self.ref]
self.total = np.sum(counts)
self.cum_sum = list(np.cumsum([0] + [el for el in counts]))

def collect_filelist(self, root):
include_ext = [".png", ".jpg", "jpeg", ".bmp"]
# collect subfolders, excluding hidden files, but following symlinks
dirs = [x[0] for x in os.walk(root, followlinks=True) if not x[0].startswith('.')]

# naturally sort, both dirs and individual images, while skipping hidden files
dirs = natsort.natsorted(dirs)

datasets = [
[os.path.join(fdir, el) for el in natsort.natsorted(os.listdir(fdir))
if os.path.isfile(os.path.join(fdir, el))
and not el.startswith('.')
and any([el.endswith(ext) for ext in include_ext])]
for fdir in dirs
]

return [el for el in datasets if el]

def __len__(self):
return self.total

def __getitem__(self, index):
# adjust index
index = len(self) + index if index < 0 else index
index = index + self.start_index

dataset_index = np.searchsorted(self.cum_sum, index + 1)
index = self.sample_rate * (index - self.cum_sum[np.maximum(0, dataset_index - 1)])

image_list = self.ref[dataset_index - 1]
input_files = [ image_list[index + offset] for offset in range(self.sequence_length + 1)]

# reverse image order with p=0.5
if self.is_training and torch.randint(0, 2, (1,)).item():
input_files = input_files[::-1]

# images = [imageio.imread(imfile)[..., :self.chsize] for imfile in input_files]
images = [cv2.imread(imfile)[..., :self.chsize] for imfile in input_files]
input_shape = images[0].shape[:2]
if self.is_training:
cropper = StaticRandomCrop(self.crop_size, input_shape)
images = map(cropper, images)

# Pad images along height and width to fit them evenly into models.
height, width = input_shape
if (height % self.stride) != 0:
padded_height = (height // self.stride + 1) * self.stride
images = [ np.pad(im, ((0, padded_height - height), (0,0), (0,0)), 'reflect') for im in images]

if (width % self.stride) != 0:
padded_width = (width // self.stride + 1) * self.stride
images = [np.pad(im, ((0, 0), (0, padded_width - width), (0, 0)), 'reflect') for im in images]

input_images = [torch.from_numpy(im.transpose(2, 0, 1)).float() for im in images]

output_dict = {
'image': input_images, 'ishape': input_shape, 'input_files': input_files
}

return output_dict
1 change: 1 addition & 0 deletions sdcnet/flownet2_pytorch
Submodule flownet2_pytorch added at ad8c16
Loading

0 comments on commit 3490c68

Please sign in to comment.