@misc{CV2018,
author = {Donny You ([email protected])},
howpublished = {\url{https://github.com/youansheng/torchcv}},
year = {2018}
}
This repository provides source code for most deep learning based cv problems. We'll do our best to keep this repository up-to-date. If you do find a problem about this repository, please raise it as an issue or submit a pull request.
-
- VGG: Very Deep Convolutional Networks for Large-Scale Image Recognition
- ResNet: Deep Residual Learning for Image Recognition
- DenseNet: Densely Connected Convolutional Networks
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- ShuffleNet V2: Practical Guidelines for Ecient CNN Architecture Design
-
- DeepLabV3: Rethinking Atrous Convolution for Semantic Image Segmentation
- PSPNet: Pyramid Scene Parsing Network
- DenseASPP: DenseASPP for Semantic Segmentation in Street Scenes
-
- SSD: Single Shot MultiBox Detector
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
- YOLOv3: An Incremental Improvement
- FPN: Feature Pyramid Networks for Object Detection
-
- CPM: Convolutional Pose Machines
- OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
-
- Mask R-CNN
Now only support Python3.x & pytorch 0.4.1, and I will update to pytorch 1.0.
pip3 install -r requirements.txt
cd extensions
sh make.sh
All the performances showed below fully reimplemented the papers' results.
- ResNet: Deep Residual Learning for Image Recognition
- Cityscapes (Single Scale Whole Image Test): Base LR 0.01, Crop Size 769
Model | Backbone | Train | Test | mIOU | BS | Iters | Scripts |
---|---|---|---|---|---|---|---|
PSPNet | 3x3-Res101 | train | val | 78.20 | 8 | 4W | PSPNet |
DeepLabV3 | 3x3-Res101 | train | val | 79.13 | 8 | 4W | DeepLabV3 |
- ADE20K (Single Scale Whole Image Test): Base LR 0.02, Crop Size 520
Model | Backbone | Train | Test | mIOU | PixelACC | BS | Iters | Scripts |
---|---|---|---|---|---|---|---|---|
PSPNet | 3x3-Res50 | train | val | 41.52 | 80.09 | 16 | 15W | PSPNet |
DeepLabv3 | 3x3-Res50 | train | val | 42.16 | 80.36 | 16 | 15W | DeepLabV3 |
PSPNet | 3x3-Res101 | train | val | 43.60 | 81.30 | 16 | 15W | PSPNet |
DeepLabv3 | 3x3-Res101 | train | val | 44.13 | 81.42 | 16 | 15W | DeepLabV3 |
- Pascal VOC2007/2012 (Single Scale Test): 20 Classes
Model | Backbone | Train | Test | mAP | BS | Epochs | Scripts |
---|---|---|---|---|---|---|---|
SSD300 | VGG16 | 07+12_trainval | 07_test | 0.786 | 32 | 235 | SSD300 |
SSD512 | VGG16 | 07+12_trainval | 07_test | 0.808 | 32 | 235 | SSD512 |
Faster R-CNN | VGG16 | 07_trainval | 07_test | 0.706 | 1 | 15 | Faster R-CNN |
- OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
- Mask R-CNN
TorchCV has defined the dataset format of all the tasks which you could check in the subdirs of datasets. Following is an example dataset directory trees for training semantic segmentation.
- You could preprocess the open datasets with the scripts in folder datasets/seg/preprocess
DataSet train image 00001.jpg/png 00002.jpg/png ... label- 00001.png 00002.png ... val image 00001.jpg/png 00002.jpg/png ... label 00001.png 00002.png ...
Take PSPNet as an example. ("tag" could be any string, include an empty one.)
- Training
cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh train tag
- Resume Training
cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh train tag
- Validate
cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh val tag
- Testing:
cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh test tag