first commit

amydaali · Jun 14, 2019 · 3490c68 · 3490c68
commit 3490c68
Show file tree

Hide file tree

Showing 15 changed files with 1,397 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,18 @@
+ckpt
+setenv
+weights
+*~
+*.pyc
+dump_imgs_train
+tb
+__pycache__/
+.idea/
+build/
+*.egg-info/
+dist/
+*.py[cod]
+*.swp
+*.o
+*.so
+.torch
+.DS_Store
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "sdcnet/flownet2_pytorch"]
+	path = sdcnet/flownet2_pytorch
+	url = https://github.com/NVIDIA/flownet2-pytorch
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,17 @@
+Copyright (C) 2019 NVIDIA Corporation. Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao and Bryan Catanzaro.
+All rights reserved. 
+Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
+
+Permission to use, copy, modify, and distribute this software and its documentation 
+for any non-commercial purpose is hereby granted without fee, provided that the above 
+copyright notice appear in all copies and that both that copyright notice and this 
+permission notice appear in supporting documentation, and that the name of the author 
+not be used in advertising or publicity pertaining to distribution of the software 
+without specific, written prior permission.
+
+THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL 
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ANY PARTICULAR PURPOSE. 
+IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL 
+DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, 
+WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING 
+OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,158 @@
+# [Improving Semantic Segmentation via Video Prediction and Label Relaxation](https://nv-adlr.github.io/publication/2018-Segmentation)
+
+![alt text](images/method.png)
+
+## Installation 
+
+    # Get Semantic Segmentation source code
+    git clone https://github.com/NVIDIA/semantic-segmentation.git
+    cd semantic-segmentation
+
+    # Build Docker Image
+    docker build -t nvidia-segmentation -f Dockerfile .
+
+Our pytorch implementation of Semantic Segmentation using Deeplabv3-Plus to achieve SOTA on cityscapes. <br />
+We are working on providing detail report, please bear with us. <br />
+To propose a model or change for inclusion, please submit a pull request.
+
+Multiple GPU training is supported, and the code provides examples for training or inference. <br />
+For more help, type <br/>
+
+    python train.py --help
+
+## Network architectures
+
+Below are the different base network architectures that are currently provided. <br />
+
+ - **WideResnet38**
+ - **SEResnext(50)-Stride8** 
+
+ We have also support in our code for different model trunks but have not been tested with current repo. 
+ - **SEResnext(50)-Stride8** 
+ - **Resnet(50,101)-Stride8**
+ - **Stride-16** 
+
+## Pre-trained Models
+We've included pre-trained models. Download checkpoints to a folder `pretrained_models`. 
+
+* [pretrained_models/cityscapes_best.pth](https://drive.google.com/file/d/1P4kPaMY-SmQ3yPJQTJ7xMGAB_Su-1zTl/view?usp=sharing)[1071MB]
+* [pretrained_models/camvid_best.pth](https://drive.google.com/file/d/1OzUCbFdXulB2P80Qxm7C3iNTeTP0Mvb_/view?usp=sharing)[1071MB]
+* [pretrained_models/kitti_best.pth"](https://drive.google.com/file/d/1OrTcqH_I3PHFiMlTTZJgBy8l_pladwtg/view?usp=sharing)[1071MB]
+* [pretrained_models/sdc_cityscapes_vrec.pth.tar](https://drive.google.com/file/d/1OxnJo2tFEQs3vuY01ibPFjn3cRCo2yWt/view?usp=sharing)[38MB]
+* [pretrained_models/FlowNet2_checkpoint.pth.tar](https://drive.google.com/file/d/1hF8vS6YeHkx3j2pfCeQqqZGwA_PJq_Da/view?usp=sharing)[620MB]
+
+
+## Data Loaders
+
+Dataloaders for Cityscapes, Mapillary, Camvid and Kitti are available in [datasets](./datasets). <br />
+
+### Python requirements 
+
+Currently, the code supports 
+* Python 3
+* Python Packages
+* numpy 
+    * PyTorch ( == 0.5.1, for <= 0.5.0 )
+    * numpy
+    * sklearn
+    * h5py
+    * scikit-image
+    * pillow
+    * piexif
+    * cffi
+    * tqdm
+    * dominate
+    * tensorboardX
+    * opencv-python
+    * nose
+    * ninja
+* An NVIDIA GPU and CUDA 9.0 or higher. Some operations only have gpu implementation.
+
+# Running the code
+
+## Training 
+
+Dataloader: To run the code you will have to change the datapath location in  `config.py` for your data.
+Model Arch: You can change the architecture name using `--arch`.
+
+`./train.sh `
+
+## Inference
+
+Our inference code supports two path pooling and sliding based eval. The pooling based eval is faster than sliding based eval but provides slightly lower numbers.    
+   `./eval.sh <weight_file>`
+
+## Label propagation using Video Prediction 
+```
+cd ./sdcnet
+bash flownet2_pytorch/install.sh
+./_eval.sh
+```
+
+## Results on Cityscapes
+
+![alt text](images/vis.png)
+
+# Training IOU
+
+Training results for WideResnet38 and SEResnext50 trained in fp16 on DGX-1 (8-GPU V100)
+
+<table class="tg">
+  <tr>
+    <th class="tg-t2cw">Model Name</th>
+    <th class="tg-t2cw">Mean IOU</th>
+    <th class="tg-t2cw">Training Time</th>
+  </tr>
+  <tr>
+    <td class="tg-rg0h">DeepWV3Plus(no sdc-aug)</td>
+    <td class="tg-rg0h">81.4</td>
+    <td class="tg-rg0h">~14 hrs</td>
+  </tr>
+  <tr>
+    <td class="tg-rg0h">DeepSRNX50V3PlusD_m1(no sdc-aug)</td>
+    <td class="tg-rg0h">80.0</td>
+    <td class="tg-rg0h">~9 hrs</td>
+  </tr>
+</table>
+
+## Reference 
+
+If you find this implementation useful in your work, please acknowledge it appropriately and cite the paper or code accordingly:
+
+```
+@InProceedings{semantic_cvpr19,
+  author       = {Yi Zhu*, Karan Sapra*, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao, Bryan Catanzaro},
+  title        = {Improving Semantic Segmentation via Video Propagation and Label Relaxation},
+  booktitle    = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
+  month        = {June},
+  year         = {2019},
+  url          = {https://nv-adlr.github.io/publication/2018-Segmentation}
+}
+* indicates equal contribution
+```
+
+```
+@misc{semantic-segmentation,
+  author = {Karan Sapra, Fitsum A. Reda, Yi Zhu, Kevin Shih, Andrew Tao, Bryan Catanzaro},
+  title = {semantic-segmentation: improving semantic segmentation via video propagation and label relaxation},
+  year = {2019},
+  publisher = {GitHub},
+  journal = {GitHub repository},
+  howpublished = {\url{https://github.com/NVIDIA/semantic-segmentation}}
+}
+
+```
+We encourage people to contribute to our code base and provide suggestions, point any issues, or solution using merge request, and we hope this repo is useful.  
+
+## Acknowledgments
+
+ Parts of the code were heavily derived from [pytorch-semantic-segmentation](https://github.com/ZijunDeng/pytorch-semantic-segmentation), [inplace-abn](https://github.com/mapillary/inplace_abn), [Pytorch](https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py), [ClementPinard/FlowNetPytorch](https://github.com/ClementPinard/FlowNetPytorch) and [Cadene](#https://github.com/Cadene/pretrained-models.pytorch)
+
+ Our initial models used SyncBN from [Synchronized Batch Norm](https://github.com/zhanghang1989/PyTorch-Encoding) but since then have been ported to [Apex SyncBN](https://github.com/NVIDIA/apex) developed by Jie Jiang.
+
+ We would also like to thank Ming-Yu Liu and Peter Kontschieder.
+
+## Coding Style
+* 4 spaces for indentation rather than tabs
+* 100 character line length
+* PEP8 formatting
diff --git a/sdcnet/_eval.sh b/sdcnet/_eval.sh
@@ -0,0 +1,21 @@
+#!/usr/bin/env bash
+# Run SDC2DRecon on Cityscapes dataset
+
+# Root folder of cityscapes images
+VAL_FILE=~/data/tmp/tinycs
+SDC2DREC_CHECKPOINT=../pretrained_models/sdc_cityscapes_vrec.pth.tar
+FLOWNET2_CHECKPOINT=../pretrained_models/FlowNet2_checkpoint.pth.tar
+
+python3 main.py \
+    --eval \
+    --sequence_length 2 \
+    --save ./ \
+    --name __evalrun \
+    --val_n_batches 1 \
+    --write_images \
+    --dataset FrameLoader \
+    --model SDCNet2DRecon \
+    --val_file ${VAL_FILE} \
+    --resume ${SDC2DREC_CHECKPOINT} \
+    --flownet2_checkpoint ${FLOWNET2_CHECKPOINT}
+
diff --git a/sdcnet/datasets/__init__.py b/sdcnet/datasets/__init__.py
@@ -0,0 +1 @@
+from .frame_loader import *
diff --git a/sdcnet/datasets/dataset_utils.py b/sdcnet/datasets/dataset_utils.py
@@ -0,0 +1,17 @@
+from __future__ import division
+from __future__ import print_function
+
+import torch
+
+class StaticRandomCrop(object):
+    """
+    Helper function for random spatial crop
+    """
+    def __init__(self, size, image_shape):
+        h, w = image_shape
+        self.th, self.tw = size
+        self.h1 = torch.randint(0, h - self.th + 1, (1,)).item()
+        self.w1 = torch.randint(0, w - self.tw + 1, (1,)).item()
+
+    def __call__(self, img):
+        return img[self.h1:(self.h1 + self.th), self.w1:(self.w1 + self.tw), :]
diff --git a/sdcnet/datasets/frame_loader.py b/sdcnet/datasets/frame_loader.py
@@ -0,0 +1,102 @@
+from __future__ import division
+from __future__ import print_function
+
+import os
+import natsort
+import numpy as np
+import cv2
+
+
+import torch
+from torch.utils import data
+from datasets.dataset_utils import StaticRandomCrop
+
+class FrameLoader(data.Dataset):
+    def __init__(self, args, root, is_training = False, transform=None):
+
+        self.is_training = is_training
+        self.transform = transform
+        self.chsize = 3
+
+        # carry over command line arguments
+        assert args.sequence_length > 1, 'sequence length must be > 1'
+        self.sequence_length = args.sequence_length
+
+        assert args.sample_rate > 0, 'sample rate must be > 0'
+        self.sample_rate = args.sample_rate
+
+        self.crop_size = args.crop_size
+        self.start_index = args.start_index
+        self.stride = args.stride
+
+        assert (os.path.exists(root))
+        if self.is_training:
+            self.start_index = 0
+
+        # collect, colors, motion vectors, and depth
+        self.ref = self.collect_filelist(root)
+
+        counts = [((len(el) - self.sequence_length) // (self.sample_rate)) for el in self.ref]
+        self.total = np.sum(counts)
+        self.cum_sum = list(np.cumsum([0] + [el for el in counts]))
+
+    def collect_filelist(self, root):
+        include_ext = [".png", ".jpg", "jpeg", ".bmp"]
+        # collect subfolders, excluding hidden files, but following symlinks
+        dirs = [x[0] for x in os.walk(root, followlinks=True) if not x[0].startswith('.')]
+
+        # naturally sort, both dirs and individual images, while skipping hidden files
+        dirs = natsort.natsorted(dirs)
+
+        datasets = [
+            [os.path.join(fdir, el) for el in natsort.natsorted(os.listdir(fdir))
+             if os.path.isfile(os.path.join(fdir, el))
+             and not el.startswith('.')
+             and any([el.endswith(ext) for ext in include_ext])]
+            for fdir in dirs
+        ]
+
+        return [el for el in datasets if el]
+
+    def __len__(self):
+        return self.total
+
+    def __getitem__(self, index):
+        # adjust index
+        index = len(self) + index if index < 0 else index
+        index = index + self.start_index
+
+        dataset_index = np.searchsorted(self.cum_sum, index + 1)
+        index = self.sample_rate * (index - self.cum_sum[np.maximum(0, dataset_index - 1)])
+
+        image_list = self.ref[dataset_index - 1]
+        input_files = [ image_list[index + offset] for offset in range(self.sequence_length + 1)]
+
+        # reverse image order with p=0.5
+        if self.is_training and torch.randint(0, 2, (1,)).item():
+            input_files = input_files[::-1]
+
+        # images = [imageio.imread(imfile)[..., :self.chsize] for imfile in input_files]
+        images = [cv2.imread(imfile)[..., :self.chsize] for imfile in input_files]
+        input_shape = images[0].shape[:2]
+        if self.is_training:
+            cropper = StaticRandomCrop(self.crop_size, input_shape)
+            images = map(cropper, images)
+
+        # Pad images along height and width to fit them evenly into models.
+        height, width = input_shape
+        if (height % self.stride) != 0:
+            padded_height = (height // self.stride + 1) * self.stride
+            images = [ np.pad(im, ((0, padded_height - height), (0,0), (0,0)), 'reflect') for im in images]
+
+        if (width % self.stride) != 0:
+            padded_width = (width // self.stride + 1) * self.stride
+            images = [np.pad(im, ((0, 0), (0, padded_width - width), (0, 0)), 'reflect') for im in images]
+
+        input_images = [torch.from_numpy(im.transpose(2, 0, 1)).float() for im in images]
+
+        output_dict = {
+            'image': input_images, 'ishape': input_shape, 'input_files': input_files
+        }
+
+        return output_dict
diff --git a/sdcnet/flownet2_pytorch b/sdcnet/flownet2_pytorch