A paper list of object detection using deep learning. I worte this page with reference to this survey paper and searching and searching..
Last updated: 2019/03/18
2018/9/18 - update all of recent papers and make some diagram about history of object detection using deep learning.
2018/9/26 - update codes of papers. (official and unofficial)
2018/october - update 5 papers and performance table.
2018/november - update 9 papers.
2018/december - update 8 papers and and performance table and add new diagram(2019 version!!).
2019/january - update 4 papers and and add commonly used datasets.
2019/february - update 3 papers.
2019/march - update figure and code links.
The part highlighted with red characters means papers that i think "must-read". However, it is my personal opinion and other papers are important too, so I recommend to read them if you have time.
FPS(Speed) index is related to the hardware spec(e.g. CPU, GPU, RAM, etc), so it is hard to make an equal comparison. The solution is to measure the performance of all models on hardware with equivalent specifications, but it is very difficult and time consuming.
Detector | VOC07 (mAP@IoU=0.5) | VOC12 (mAP@IoU=0.5) | COCO (mAP@IoU=0.5:0.95) | Published In |
---|---|---|---|---|
R-CNN | 58.5 | - | - | CVPR'14 |
SPP-Net | 59.2 | - | - | ECCV'14 |
MR-CNN | 78.2 (07+12) | 73.9 (07+12) | - | ICCV'15 |
Fast R-CNN | 70.0 (07+12) | 68.4 (07++12) | 19.7 | ICCV'15 |
Faster R-CNN | 73.2 (07+12) | 70.4 (07++12) | 21.9 | NIPS'15 |
YOLO v1 | 66.4 (07+12) | 57.9 (07++12) | - | CVPR'16 |
G-CNN | 66.8 | 66.4 (07+12) | - | CVPR'16 |
AZNet | 70.4 | - | 22.3 | CVPR'16 |
ION | 80.1 | 77.9 | 33.1 | CVPR'16 |
HyperNet | 76.3 (07+12) | 71.4 (07++12) | - | CVPR'16 |
OHEM | 78.9 (07+12) | 76.3 (07++12) | 22.4 | CVPR'16 |
MPN | - | - | 33.2 | BMVC'16 |
SSD | 76.8 (07+12) | 74.9 (07++12) | 31.2 | ECCV'16 |
GBDNet | 77.2 (07+12) | - | 27.0 | ECCV'16 |
CPF | 76.4 (07+12) | 72.6 (07++12) | - | ECCV'16 |
R-FCN | 79.5 (07+12) | 77.6 (07++12) | 29.9 | NIPS'16 |
DeepID-Net | 69.0 | - | - | PAMI'16 |
NoC | 71.6 (07+12) | 68.8 (07+12) | 27.2 | TPAMI'16 |
DSSD | 81.5 (07+12) | 80.0 (07++12) | 33.2 | arXiv'17 |
TDM | - | - | 37.3 | CVPR'17 |
FPN | - | - | 36.2 | CVPR'17 |
YOLO v2 | 78.6 (07+12) | 73.4 (07++12) | - | CVPR'17 |
RON | 77.6 (07+12) | 75.4 (07++12) | 27.4 | CVPR'17 |
DeNet | 77.1 (07+12) | 73.9 (07++12) | 33.8 | ICCV'17 |
CoupleNet | 82.7 (07+12) | 80.4 (07++12) | 34.4 | ICCV'17 |
RetinaNet | - | - | 39.1 | ICCV'17 |
DSOD | 77.7 (07+12) | 76.3 (07++12) | - | ICCV'17 |
SMN | 70.0 | - | - | ICCV'17 |
Light-Head R-CNN | - | - | 41.5 | arXiv'17 |
YOLO v3 | - | - | 33.0 | arXiv'18 |
SIN | 76.0 (07+12) | 73.1 (07++12) | 23.2 | CVPR'18 |
STDN | 80.9 (07+12) | - | - | CVPR'18 |
RefineDet | 83.8 (07+12) | 83.5 (07++12) | 41.8 | CVPR'18 |
SNIP | - | - | 45.7 | CVPR'18 |
Relation-Network | - | - | 32.5 | CVPR'18 |
Cascade R-CNN | - | - | 42.8 | CVPR'18 |
MLKP | 80.6 (07+12) | 77.2 (07++12) | 28.6 | CVPR'18 |
Fitness-NMS | - | - | 41.8 | CVPR'18 |
RFBNet | 82.2 (07+12) | - | - | ECCV'18 |
CornerNet | - | - | 42.1 | ECCV'18 |
PFPNet | 84.1 (07+12) | 83.7 (07++12) | 39.4 | ECCV'18 |
Pelee | 70.9 (07+12) | - | - | NIPS'18 |
HKRM | 78.8 (07+12) | - | 37.8 | NIPS'18 |
M2Det | - | - | 44.2 | AAAI'19 |
R-DAD | 81.2 (07++12) | 82.0 (07++12) | 43.1 | AAAI'19 |
-
[R-CNN] Rich feature hierarchies for accurate object detection and semantic segmentation | Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik | [CVPR' 14] |
[pdf]
[official code - caffe]
-
[OverFeat] OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks | Pierre Sermanet, et al. | [ICLR' 14] |
[pdf]
[official code - torch]
-
[MultiBox] Scalable Object Detection using Deep Neural Networks | Dumitru Erhan, et al. | [CVPR' 14] |
[pdf]
-
[SPP-Net] Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition | Kaiming He, et al. | [ECCV' 14] |
[pdf]
[official code - caffe]
[unofficial code - keras]
[unofficial code - tensorflow]
-
Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction | Yuting Zhang, et. al. | [CVPR' 15] |
[pdf]
[official code - matlab]
-
[MR-CNN] Object detection via a multi-region & semantic segmentation-aware CNN model | Spyros Gidaris, Nikos Komodakis | [ICCV' 15] |
[pdf]
[official code - caffe]
-
[DeepBox] DeepBox: Learning Objectness with Convolutional Networks | Weicheng Kuo, Bharath Hariharan, Jitendra Malik | [ICCV' 15] |
[pdf]
[official code - caffe]
-
[AttentionNet] AttentionNet: Aggregating Weak Directions for Accurate Object Detection | Donggeun Yoo, et al. | [ICCV' 15] |
[pdf]
-
[Fast R-CNN] Fast R-CNN | Ross Girshick | [ICCV' 15] |
[pdf]
[official code - caffe]
-
[DeepProposal] DeepProposal: Hunting Objects by Cascading Deep Convolutional Layers | Amir Ghodrati, et al. | [ICCV' 15] |
[pdf]
[official code - matconvnet]
-
[Faster R-CNN, RPN] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks | Shaoqing Ren, et al. | [NIPS' 15] |
[pdf]
[official code - caffe]
[unofficial code - tensorflow]
[unofficial code - pytorch]
-
[YOLO v1] You Only Look Once: Unified, Real-Time Object Detection | Joseph Redmon, et al. | [CVPR' 16] |
[pdf]
[official code - c]
-
[G-CNN] G-CNN: an Iterative Grid Based Object Detector | Mahyar Najibi, et al. | [CVPR' 16] |
[pdf]
-
[AZNet] Adaptive Object Detection Using Adjacency and Zoom Prediction | Yongxi Lu, Tara Javidi. | [CVPR' 16] |
[pdf]
-
[ION] Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks | Sean Bell, et al. | [CVPR' 16] |
[pdf]
-
[HyperNet] HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection | Tao Kong, et al. | [CVPR' 16] |
[pdf]
-
[OHEM] Training Region-based Object Detectors with Online Hard Example Mining | Abhinav Shrivastava, et al. | [CVPR' 16] |
[pdf]
[official code - caffe]
-
[CRAPF] CRAFT Objects from Images | Bin Yang, et al. | [CVPR' 16] |
[pdf]
[official code - caffe]
-
[MPN] A MultiPath Network for Object Detection | Sergey Zagoruyko, et al. | [BMVC' 16] |
[pdf]
[official code - torch]
-
[SSD] SSD: Single Shot MultiBox Detector | Wei Liu, et al. | [ECCV' 16] |
[pdf]
[official code - caffe]
[unofficial code - tensorflow]
[unofficial code - pytorch]
-
[GBDNet] Crafting GBD-Net for Object Detection | Xingyu Zeng, et al. | [ECCV' 16] |
[pdf]
[official code - caffe]
-
[CPF] Contextual Priming and Feedback for Faster R-CNN | Abhinav Shrivastava and Abhinav Gupta | [ECCV' 16] |
[pdf]
-
[MS-CNN] A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection | Zhaowei Cai, et al. | [ECCV' 16] |
[pdf]
[official code - caffe]
-
[R-FCN] R-FCN: Object Detection via Region-based Fully Convolutional Networks | Jifeng Dai, et al. | [NIPS' 16] |
[pdf]
[official code - caffe]
[unofficial code - caffe]
-
[PVANET] PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection | Kye-Hyeon Kim, et al. | [NIPSW' 16] |
[pdf]
[official code - caffe]
-
[DeepID-Net] DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection | Wanli Ouyang, et al. | [PAMI' 16] |
[pdf]
-
[NoC] Object Detection Networks on Convolutional Feature Maps | Shaoqing Ren, et al. | [TPAMI' 16] |
[pdf]
-
[DSSD] DSSD : Deconvolutional Single Shot Detector | Cheng-Yang Fu1, et al. | [arXiv' 17] |
[pdf]
[official code - caffe]
-
[TDM] Beyond Skip Connections: Top-Down Modulation for Object Detection | Abhinav Shrivastava, et al. | [CVPR' 17] |
[pdf]
-
[FPN] Feature Pyramid Networks for Object Detection | Tsung-Yi Lin, et al. | [CVPR' 17] |
[pdf]
[unofficial code - caffe]
-
[YOLO v2] YOLO9000: Better, Faster, Stronger | Joseph Redmon, Ali Farhadi | [CVPR' 17] |
[pdf]
[official code - c]
[unofficial code - caffe]
[unofficial code - tensorflow]
[unofficial code - tensorflow]
[unofficial code - pytorch]
-
[RON] RON: Reverse Connection with Objectness Prior Networks for Object Detection | Tao Kong, et al. | [CVPR' 17] |
[pdf]
[official code - caffe]
[unofficial code - tensorflow]
-
[RSA] Recurrent Scale Approximation for Object Detection in CNN | Yu Liu, et al. | | [ICCV' 17] |
[pdf]
[official code - caffe]
-
[DCN] Deformable Convolutional Networks | Jifeng Dai, et al. | [ICCV' 17] |
[pdf]
[official code - mxnet]
[unofficial code - tensorflow]
[unofficial code - pytorch]
-
[DeNet] DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling | Lachlan Tychsen-Smith, Lars Petersson | [ICCV' 17] |
[pdf]
[official code - theano]
-
[CoupleNet] CoupleNet: Coupling Global Structure with Local Parts for Object Detection | Yousong Zhu, et al. | [ICCV' 17] |
[pdf]
[official code - caffe]
-
[RetinaNet] Focal Loss for Dense Object Detection | Tsung-Yi Lin, et al. | [ICCV' 17] |
[pdf]
[official code - keras]
[unofficial code - pytorch]
[unofficial code - mxnet]
[unofficial code - tensorflow]
-
[Mask R-CNN] Mask R-CNN | Kaiming He, et al. | [ICCV' 17] |
[pdf]
[official code - caffe2]
[unofficial code - tensorflow]
[unofficial code - tensorflow]
[unofficial code - pytorch]
-
[DSOD] DSOD: Learning Deeply Supervised Object Detectors from Scratch | Zhiqiang Shen, et al. | [ICCV' 17] |
[pdf]
[official code - caffe]
[unofficial code - pytorch]
-
[SMN] Spatial Memory for Context Reasoning in Object Detection | Xinlei Chen, Abhinav Gupta | [ICCV' 17] |
[pdf]
-
[Light-Head R-CNN] Light-Head R-CNN: In Defense of Two-Stage Object Detector | Zeming Li, et al. | [arXiv' 17] |
[pdf]
[official code - tensorflow]
-
[Soft-NMS] Improving Object Detection With One Line of Code | Navaneeth Bodla, et al. | [ICCV' 17] |
[pdf]
[official code - caffe]
-
[YOLO v3] YOLOv3: An Incremental Improvement | Joseph Redmon, Ali Farhadi | [arXiv' 18] |
[pdf]
[official code - c]
[unofficial code - pytorch]
[unofficial code - pytorch]
[unofficial code - keras]
[unofficial code - tensorflow]
-
[ZIP] Zoom Out-and-In Network with Recursive Training for Object Proposal | Hongyang Li, et al. | [IJCV' 18] |
[pdf]
[official code - caffe]
-
[SIN] Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships | Yong Liu, et al. | [CVPR' 18] |
[pdf]
[official code - tensorflow]
-
[STDN] Scale-Transferrable Object Detection | Peng Zhou, et al. | [CVPR' 18] |
[pdf]
-
[RefineDet] Single-Shot Refinement Neural Network for Object Detection | Shifeng Zhang, et al. | [CVPR' 18] |
[pdf]
[official code - caffe]
[unofficial code - chainer]
[unofficial code - pytorch]
-
[MegDet] MegDet: A Large Mini-Batch Object Detector | Chao Peng, et al. | [CVPR' 18] |
[pdf]
-
[DA Faster R-CNN] Domain Adaptive Faster R-CNN for Object Detection in the Wild | Yuhua Chen, et al. | [CVPR' 18] |
[pdf]
[official code - caffe]
-
[SNIP] An Analysis of Scale Invariance in Object Detection – SNIP | Bharat Singh, Larry S. Davis | [CVPR' 18] |
[pdf]
-
[Relation-Network] Relation Networks for Object Detection | Han Hu, et al. | [CVPR' 18] |
[pdf]
[official code - mxnet]
-
[Cascade R-CNN] Cascade R-CNN: Delving into High Quality Object Detection | Zhaowei Cai, et al. | [CVPR' 18] |
[pdf]
[official code - caffe]
-
Finding Tiny Faces in the Wild with Generative Adversarial Network | Yancheng Bai, et al. | [CVPR' 18] |
[pdf]
-
[MLKP] Multi-scale Location-aware Kernel Representation for Object Detection | Hao Wang, et al. | [CVPR' 18] |
[pdf]
[official code - caffe]
-
Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation | Naoto Inoue, et al. | [CVPR' 18] |
[pdf]
[official code - chainer]
-
[Fitness NMS] Improving Object Localization with Fitness NMS and Bounded IoU Loss | Lachlan Tychsen-Smith, Lars Petersson. | [CVPR' 18] |
[pdf]
-
[STDnet] STDnet: A ConvNet for Small Target Detection | Brais Bosquet, et al. | [BMVC' 18] |
[pdf]
-
[RFBNet] Receptive Field Block Net for Accurate and Fast Object Detection | Songtao Liu, et al. | [ECCV' 18] |
[pdf]
[official code - pytorch]
-
Zero-Annotation Object Detection with Web Knowledge Transfer | Qingyi Tao, et al. | [ECCV' 18] |
[pdf]
-
[CornerNet] CornerNet: Detecting Objects as Paired Keypoints | Hei Law, et al. | [ECCV' 18] |
[pdf]
[official code - pytorch]
-
[PFPNet] Parallel Feature Pyramid Network for Object Detection | Seung-Wook Kim, et al. | [ECCV' 18] |
[pdf]
-
[Softer-NMS] Softer-NMS: Rethinking Bounding Box Regression for Accurate Object Detection | Yihui He, et al. | [arXiv' 18] |
[pdf]
-
[ShapeShifter] ShapeShifter: Robust Physical Adversarial Attack on Faster R-CNN Object Detector | Shang-Tse Chen, et al. | [ECML-PKDD' 18] |
[pdf]
[official code - tensorflow]
-
[Pelee] Pelee: A Real-Time Object Detection System on Mobile Devices | Jun Wang, et al. | [NIPS' 18] |
[pdf]
[official code - caffe]
-
[HKRM] Hybrid Knowledge Routed Modules for Large-scale Object Detection | ChenHan Jiang, et al. | [NIPS' 18] |
[pdf]
-
[MetaAnchor] MetaAnchor: Learning to Detect Objects with Customized Anchors | Tong Yang, et al. | [NIPS' 18] |
[pdf]
-
[SNIPER] SNIPER: Efficient Multi-Scale Training | Bharat Singh, et al. | [NIPS' 18] |
[pdf]
-
[M2Det] M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | Qijie Zhao, et al. | [AAAI' 19] |
[pdf]
[official code - pytorch]
-
[R-DAD] Object Detection based on Region Decomposition and Assembly | Seung-Hwan Bae | [AAAI' 19] |
[pdf]
-
[CAMOU] CAMOU: Learning Physical Vehicle Camouflages to Adversarially Attack Detectors in the Wild | Yang Zhang, et al. | [ICLR' 19] |
[pdf]
Statistics of commonly used object detection datasets. The Figure came from this survey paper.
The papers related to datasets used mainly in Object Detection are as follows.
-
[PASCAL VOC] The PASCAL Visual Object Classes (VOC) Challenge | Mark Everingham, et al. | [IJCV' 10] |
[pdf]
-
[PASCAL VOC] The PASCAL Visual Object Classes Challenge: A Retrospective | Mark Everingham, et al. | [IJCV' 15] |
[pdf]
|[link]
-
[ImageNet] ImageNet: A Large-Scale Hierarchical Image Database | Jia Deng, et al. | [CVPR' 09] |
[pdf]
-
[ImageNet] ImageNet Large Scale Visual Recognition Challenge | Olga Russakovsky, et al. | [IJCV' 15] |
[pdf]
|[link]
-
[COCO] Microsoft COCO: Common Objects in Context | Tsung-Yi Lin, et al. | [ECCV' 14] |
[pdf]
|[link]
-
[Open Images] The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale | A Kuznetsova, et al. | [arXiv' 18] |
[pdf]
|[link]
If you have any suggestions about papers, feel free to mail me :)