- LeNet http://yann.lecun.com/exdb/lenet/index.html
- AlexNet http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
- ZFNet(Visualizing and Understanding Convolutional Networks) https://arxiv.org/abs/1311.2901
- VGG https://arxiv.org/abs/1409.1556
- GoogLeNet, Inceptionv1(Going deeper with convolutions) https://arxiv.org/abs/1409.4842
- Batch Normalization https://arxiv.org/abs/1502.03167
- Inceptionv3(Rethinking the Inception Architecture for Computer Vision) https://arxiv.org/abs/1512.00567
- Inceptionv4, Inception-ResNet https://arxiv.org/abs/1602.07261
- Xception(Deep Learning with Depthwise Separable Convolutions) https://arxiv.org/abs/1610.02357
- ResNet https://arxiv.org/abs/1512.03385
- ResNeXt https://arxiv.org/abs/1611.05431
- DenseNet https://arxiv.org/abs/1608.06993
- NASNet-A(Learning Transferable Architectures for Scalable Image Recognition) https://arxiv.org/abs/1707.07012
- SENet(Squeeze-and-Excitation Networks) https://arxiv.org/abs/1709.01507
- MobileNet(v1) https://arxiv.org/abs/1704.04861
- MobileNet(v2) https://arxiv.org/abs/1801.04381
- MobileNet(v3) https://arxiv.org/abs/1905.02244
- ShuffleNet(v1) https://arxiv.org/abs/1707.01083
- ShuffleNet(v2) https://arxiv.org/abs/1807.11164
- Bag of Tricks for Image Classification with Convolutional Neural Networks https://arxiv.org/abs/1812.01187
- EfficientNet(v1) https://arxiv.org/abs/1905.11946
- EfficientNet(v2) https://arxiv.org/abs/2104.00298
- CSPNet https://arxiv.org/abs/1911.11929
- RegNet https://arxiv.org/abs/2003.13678
- NFNets(High-Performance Large-Scale Image Recognition Without Normalization) https://arxiv.org/abs/2102.06171
- Vision Transformer https://arxiv.org/abs/2010.11929
- DeiT(Training data-efficient image transformers ) https://arxiv.org/abs/2012.12877
- Swin Transformer https://arxiv.org/abs/2103.14030
- Swin Transformer V2: Scaling Up Capacity and Resolution https://arxiv.org/abs/2111.09883
- BEiT: BERT Pre-Training of Image Transformers https://arxiv.org/abs/2106.08254
- MAE(Masked Autoencoders Are Scalable Vision Learners) https://arxiv.org/abs/2111.06377
- ConvNeXt(A ConvNet for the 2020s) https://arxiv.org/abs/2201.03545
- R-CNN https://arxiv.org/abs/1311.2524
- Fast R-CNN https://arxiv.org/abs/1504.08083
- Faster R-CNN https://arxiv.org/abs/1506.01497
- Cascade R-CNN: Delving into High Quality Object Detection https://arxiv.org/abs/1712.00726
- Mask R-CNN https://arxiv.org/abs/1703.06870
- SSD https://arxiv.org/abs/1512.02325
- FPN(Feature Pyramid Networks for Object Detection) https://arxiv.org/abs/1612.03144
- RetinaNet(Focal Loss for Dense Object Detection) https://arxiv.org/abs/1708.02002
- Bag of Freebies for Training Object Detection Neural Networks https://arxiv.org/abs/1902.04103
- YOLOv1 https://arxiv.org/abs/1506.02640
- YOLOv2 https://arxiv.org/abs/1612.08242
- YOLOv3 https://arxiv.org/abs/1804.02767
- YOLOv4 https://arxiv.org/abs/2004.10934
- YOLOX(Exceeding YOLO Series in 2021) https://arxiv.org/abs/2107.08430
- PP-YOLO https://arxiv.org/abs/2007.12099
- PP-YOLOv2 https://arxiv.org/abs/2104.10419
- CornerNet https://arxiv.org/abs/1808.01244
- FCOS https://arxiv.org/abs/1904.01355
- CenterNet https://arxiv.org/abs/1904.07850
- FCN(Fully Convolutional Networks for Semantic Segmentation) https://arxiv.org/abs/1411.4038
- UNet(U-Net: Convolutional Networks for Biomedical Image Segmentation) https://arxiv.org/abs/1505.04597
- DeepLabv1(Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs) https://arxiv.org/abs/1412.7062
- DeepLabv2(Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs) https://arxiv.org/abs/1606.00915
- DeepLabv3(Rethinking Atrous Convolution for Semantic Image Segmentation) https://arxiv.org/abs/1706.05587
- DeepLabv3+(Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation) https://arxiv.org/abs/1802.02611
- Mask R-CNN https://arxiv.org/abs/1703.06870
- Attention Is All You Need https://arxiv.org/abs/1706.03762
- Microsoft COCO: Common Objects in Context https://arxiv.org/abs/1405.0312
- The PASCALVisual Object Classes Challenge: A Retrospective http://host.robots.ox.ac.uk/pascal/VOC/pubs/everingham15.pdf
- Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization https://arxiv.org/abs/1610.02391