New Study : https://www.notion.so/Reading-Papers-Deep-Learning-504b50ddaed14360b34dfd6d49cb3455
Update 2024.01.09
- 개인 공부라 열심히는 하고 있으나, 완벽한 리뷰가 아닙니다.
- 리뷰가 끝나더라도 계속 의문/생각/교정/좋은자료가 있다면 꾸준히 업데이트 됩니다.
- link review는 다른 분들이 하신 좋은 리뷰를 링크한 것입니다.
- light_link는 빠르게 개념(abstract)정도로 본 논문을 의미.
- 현재 상황이 리뷰 공개를 못하고 있는 상황이라, 논문 링크로만 정리진행합니다.
Virtual Try On [Link]
Asymmetric Image Retrieval [Link]
Diffusion [Link]
- Revisiting Small Batch Training for Deep Neural Networks : [paper][review]
- Weight Standardization : [paper][link_review] [link_review]
- Effects of Image Size on Deep Learning : [paper]
- Inductive Bias : [link_review]
- Learning Discriminative Representations for Multi-Label Image Recognition : [paper]
- Knowledge distillation: A good teacher is patient and consistent : [paper]
- Hierarchical Self-supervised Augmented Knowledge Distillation : [paper]
- Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation : [paper]
Vision and Language Pre-trained [Link]
- CLIP : Learning Transferable Visual Models From Natural Language Supervision : [paper] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review]
- How Much Can CLIP Benefit Vision-and-Language Tasks? : [paper]
- Zero-Shot Open Set Detection by Extending CLIP : [paper]
- Bag of Tricks for Image Classification with Convolutional Neural Networks : [paper] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review]
- Unsupervised Representation Learning by Predicting Image Rotations : [paper][]
- Unsupervised Visual Representation Learning by Context Prediction : [paper][]
- Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles : [paper][]
- Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks : [paper][]
- Rethinking Pre-training and Self-training : [paper][]
- Selfie: Self-supervised Pretraining for Image Embedding : [paper] [light_review]
- Self-training with Noisy Student improves ImageNet classification : [paper] [review]
- SimCLR : A Simple Framework for Contrastive Learning of Visual Representations : [paper] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review]
- SimCLR V2:Big Self-Supervised Models are Strong Semi-Supervised Learners : [paper]
- MoCo : Momentum Contrast for Unsupervised Visual Representation Learning : [paper]
- MoCo V2 : Improved Baselines with Momentum Contrastive Learning : [paper] [link_review] [link_review]
- MoCo V3 : An Empirical Study of Training Self-Supervised Vision Transformers: [paper] [link_review] [link_review]
- BYOL : Bootstrap your own latent: A new approach to self-supervised Learning: [paper]
- Exploring the limits of weakly supervised pretraining : [paper]
- Triplet is All You Need with Random Mappings for Unsupervised Visual Representation Learning : [paper]
- ScatSimCLR: self-supervised contrastive learning with pretext task regularization for small-scale datasets : [paper]
- MST: Masked Self-Supervised Transformer for Visual Representation : [paper]
- Masked Autoencoders Are Scalable Vision Learners : [paper]
- SimMIM: A Simple Framework for Masked Image Modeling : [paper]
- InsCLR: Improving Instance Retrieval with Self-Supervision : [paper]
- Stand-Alone Self-Attention in Vision Models : [paper][review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review]
- Selfie: Self-supervised Pretraining for Image Embedding : [paper] [light_review] [link_review] [link_review]
- ViT:An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: [paper] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review]
- DeiT:Training data-efficient image transformers & distillation through attention : [paper] [link_review] [link_review] [link_review] [link_review]
- Bottleneck Transformers for Visual Recognition: [paper] [link_review]
- Going deeper with Image Transformers: [paper]
- Rethinking Spatial Dimensions of Vision Transformers : [paper]
- On the Adversarial Robustness of Visual Transformers: [paper]
- TransFG: A Transformer Architecture for Fine-grained Recognition : [paper]
- Understanding Robustness of Transformers for Image Classification : [paper]
- DeepViT: Towards Deeper Vision Transformer : [paper]
- CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification : [paper]
- CvT: Introducing Convolutions to Vision Transformers: [paper] [link_review]
- Efficient Feature Transformations for Discriminative and Generative Continual Learning : [paper]
- Swin Transformer: Hierarchical Vision Transformer using Shifted Windows : [paper] [link_review] [link_review] [link_review]
- Can Vision Transformers Learn without Natural Images?: [paper]
- Scaling Local Self-Attention for Parameter Efficient Visual Backbones: [paper]
- Incorporating Convolution Designs into Visual Transformers : [paper]
- ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases : [paper]
- Explicitly Modeled Attention Maps for Image Classification : [paper]
- Conditional Positional Encodings for Vision Transformers : [paper]
- Transformer in Transformer: [paper] [link_review]
- A Survey on Visual Transformer: [paper]
- Co-Scale Conv-Attentional Image Transformers: [paper]
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity : [paper] [link_review]
- LocalViT: Bringing Locality to Vision Transformers : [paper]
- Visformer: The Vision-friendly Transformer : [paper]
- Multiscale Vision Transformers : [paper] [link_review] [link_review]
- So-ViT: Mind Visual Tokens for Vision Transformer: [paper]
- Token Labeling: Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet (이후 "All Tokens Matter: Token Labeling for Training Better Vision Transformers 변경"): [paper]
- Fourier Image Transformer: [paper]
- Emerging Properties in Self-Supervised Vision Transformers: [paper]
- ConTNet: Why not use convolution and transformer at the same time?: [paper]
- Twins: Revisiting Spatial Attention Design in Vision Transformers: [paper]
- MoCo V3 :An Empirical Study of Training Self-Supervised Vision Transformers: [paper] [link_review] [link_review]
- Conformer: Local Features Coupling Global Representations for Visual Recognition: [paper]
- Self-Supervised Learning with Swin Transformers: [paper]
- Are Pre-trained Convolutions Better than Pre-trained Transformers?: [paper]
- LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference: [paper]
- Are Convolutional Neural Networks or Transformers more like human vision?: [paper]
- Rethinking Skip Connection with Layer Normalization in Transformers and ResNets: [paper]
- Rethinking the Design Principles of Robust Vision Transformer (Towards Robust Vision Transformer): [paper]
- Longformer: The Long-Document Transformer : [paper] [link_review] [link_review] [link_review] [link_review]
- Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding: [paper]
- On the Robustness of Vision Transformers to Adversarial Examples: [paper]
- Refiner: Refining Self-attention for Vision Transformers: [paper]
- Patch Slimming for Efficient Vision Transformers: [paper]
- RegionViT: Regional-to-Local Attention for Vision Transformers: [paper]
- X-volution: On the unification of convolution and self-attention: [paper]
- The Image Local Autoregressive Transformer: [paper]
- Glance-and-Gaze Vision Transformer: [paper]
- Semantic Correspondence with Transformers: [paper]
- DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification: [paper]
- When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations: [paper] [link_review]
- KVT: k-NN Attention for Boosting Vision Transformers: [paper]
- Less is More: Pay Less Attention in Vision Transformers: [paper]
- FoveaTer: Foveated Transformer for Image Classification: [paper]
- An Attention Free Transformer: [paper]
- Not All Images are Worth 16x16 Words: Dynamic Vision Transformers with Adaptive Sequence Length: [paper]
- Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks: [paper] [link_review]
- Pre-Trained Image Processing Transformer: [paper] [link_review]
- ResT: An Efficient Transformer for Visual Recognition: [paper]
- Towards Robust Vision Transformer: [paper]
- Aggregating Nested Transformers: [paper]
- GasHis-Transformer: A Multi-scale Visual Transformer Approach for Gastric Histopathology Image Classification: [paper]
- Intriguing Properties of Vision Transformers: [paper] [link_review] [link_review] [link_review]
- Vision Transformers are Robust Learners: [paper]
- Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer: [paper]
- A Survey of Transformers: [paper]
- Armour: Generalizable Compact Self-Attention for Vision Transformers : [paper]
- Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer : [paper]
- Dual-stream Network for Visual Recognition : [paper]
- BEiT: BERT Pre-Training of Image Transformers : [paper]
- Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions : [paper]
- PVTv2: Improved Baselines with Pyramid Vision Transformer : [paper]
- Thinking Like Transformers : [paper]
- CMT: Convolutional Neural Networks Meet Vision Transformers : [paper] [link_review] [link_review]
- Transformer with Peak Suppression and Knowledge Guidance for Fine-grained Image Recognition : [paper]
- ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias : [paper]
- Visual Transformer Pruning : [paper]
- Local-to-Global Self-Attention in Vision Transformers : [paper]
- Feature Fusion Vision Transformer for Fine-Grained Visual Categorization : [paper]
- Vision Xformers: Efficient Attention for Image Classification : [paper]
- EsViT : Efficient Self-supervised Vision Transformers for Representation Learning : [paper]
- GLiT: Neural Architecture Search for Global and Local Image Transformer : [paper]
- Efficient Vision Transformers via Fine-Grained Manifold Distillation : [paper]
- What Makes for Hierarchical Vision Transformer? : [paper]
- AutoFormer: Searching Transformers for Visual Recognition : [paper]
- Focal Self-attention for Local-Global Interactions in Vision Transformers : [paper] [link_review]
- ConvNets vs. Transformers: Whose Visual Representations are More Transferable? : [paper]
- Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight : [paper]
- Mobile-Former: Bridging MobileNet and Transformer : [paper]
- Image Fusion Transformer : [paper]
- PSViT: Better Vision Transformer via Token Pooling and Attention Sharing : [paper]
- Do Vision Transformers See Like Convolutional Neural Networks? : [paper]
- Linformer: Self-Attention with Linear Complexity : [paper] [link_review]
- CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows : [paper]
- How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers : [paper]
- Searching for Efficient Multi-Stage Vision Transformers : [paper]
- Exploring and Improving Mobile Level Vision Transformers : [paper]
- XCiT: Cross-Covariance Image Transformers : [paper]
- Efficient Vision Transformers via Fine-Grained Manifold Distillation : [paper]
- Scaled ReLU Matters for Training Vision Transformers : [paper]
- VOLO: Vision Outlooker for Visual Recognition : [paper]
- CoAtNet: Marrying Convolution and Attention for All Data Sizes : [paper] [link_review] [link_review]
- MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer : [paper]
- A free lunch from ViT: Adaptive Attention Multi-scale Fusion Transformer for Fine-grained Visual Recognition : [paper]
- Improved Multiscale Vision Transformers for Classification and Detection : [paper]
- Self-Attention with Relative Position Representations : [paper] [link_review]
- Vision Transformer with Progressive Sampling : [paper]
- DPT: Deformable Patch-based Transformer for Visual Recognition : [paper]
- CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings : [paper]
- Rethinking and Improving Relative Position Encoding for Vision Transformer : [paper]
- Rethinking Positional Encoding : [paper]
- Relative Positional Encoding for Transformers with Linear Complexity : [paper]
- Conditional Positional Encodings for Vision Transformers : [paper]
- Pyramid Adversarial Training Improves ViT Performance : [paper]
- Shunted Self-Attention via Multi-Scale Token Aggregation : [paper]
- AdaViT: Adaptive Vision Transformers for Efficient Image Recognition : [paper]
- ATS: Adaptive Token Sampling For Efficient Vision Transformers : [paper]
- Global Interaction Modelling in Vision Transformer via Super Tokens : [paper]
- AS-MLP: An Axial Shifted MLP Architecture for Vision : [paper]
- S2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision : [paper]
- ResMLP: Feedforward networks for image classification with data-efficient training: [paper]
- Pay Attention to MLPs: [paper]
- Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet: [paper]
- MLP-Mixer: An all-MLP Architecture for Vision : [paper]
- Sparse-MLP: A Fully-MLP Architecture with Conditional Computation : [paper]
- ConvMLP: Hierarchical Convolutional MLPs for Vision : [paper]
- Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition : [paper]
- MetaFormer is Actually What You Need for Vision : [paper]
- Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers : [paper] *
- Investigating the Vision Transformer Model for Image Retrieval Tasks: [paper]
- Training Vision Transformers for Image Retrieval: [paper]
- Instance-level Image Retrieval using Reranking Transformers: [paper]
- Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval: [paper]
- TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval : [paper]
- Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations : [paper]
- Vision Transformer Hashing for Image Retrieval : [paper]
- CoSformer: Detecting Co-Salient Object with Transformers: [paper]
- MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding: [paper]
- Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks: [paper]
- Medical Image Segmentation Using Squeeze-and-Expansion Transformers: [paper]
- SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers: [paper]
- Visual Transformers: Token-based Image Representation and Processing for Computer Vision : [paper][]
- DETR:End-to-End Object Detection with Transformers : [paper] [link_review] [link_review] [link_review] [link_review] [link_review]
- Unifying Global-Local Representations in Salient Object Detection with Transformer : [paper]
- A Unified Efficient Pyramid Transformer for Semantic Segmentation : [paper]
- Dual-stream Network for Visual Recognition : [paper]
- MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers : [paper]
- Vision Transformers with Patch Diversification : [paper]
- Improve Vision Transformers Training by Suppressing Over-smoothing : [paper]
- SOTR: Segmenting Objects with Transformers : [paper]
- Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer : [paper]
- Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers : [paper]
- Unifying Global-Local Representations in Salient Object Detection with Transformer : [paper]
- Conditional DETR for Fast Training Convergence : [paper]
- Fully Transformer Networks for Semantic Image Segmentation : [paper]
- Segmenter: Transformer for Semantic Segmentation : [paper]
- nnFormer: Interleaved Transformer for Volumetric Segmentation : [paper]
- Benchmarking Detection Transfer Learning with Vision Transformers : [paper]
- An Image is Worth 16x16 Words, What is a Video Worth?: [paper]
- Token Shift Transformer for Video Classification : [paper]
- Robust Facial Expression Recognition with Convolutional Visual Transformers : [paper]
- Learning Vision Transformer with Squeeze and Excitation for Facial Expression Recognition : [paper]
- NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition : [paper][]
- On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention : [paper][]
- 2D Attentional Irregular Scene Text Recognizer : [paper][]
- ReFormer: The Relational Transformer for Image Captioning : [paper]
- Long-Short Transformer: Efficient Transformers for Language and Vision : [paper]
- A Hierarchical Transformation-Discriminating Generative Model for Few Shot Anomaly Detection : [paper]
- ViTGAN: Training GANs with Vision Transformers : [paper]
- Styleformer: Transformer based Generative Adversarial Networks with Style Vector : [paper]
- Combining Transformer Generators with Convolutional Discriminators : [paper]
- 3rd Place: A Global and Local Dual Retrieval Solution to Facebook AI Image Similarity Challenge : [paper]
Image Retrieval (Instance level Image Retrieval) & Deep Feature
- (My paper) All the attention you need: Global-local, spatial-channel attention for image retrieval : [paper]
- Large-Scale Image Retrieval with Attentive Deep Local Features : [paper] [review]
- NetVLAD: CNN architecture for weakly supervised place recognition : [paper][review]
- Learning visual similarity for product design with convolutional neural networks : [paper][review]
- Bags of Local Convolutional Features for Scalable Instance Search : [paper][review]
- Neural Codes for Image Retrieval : [paper][review]
- Conditional Similarity Networks : [paper][review]
- End-to-end Learning of Deep Visual Representations for Image Retrieval : [paper][review]
- CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples : [paper][review]
- Image similarity using Deep CNN and Curriculum Learning : [paper][review]
- Faster R-CNN Features for Instance Search : [paper][review]
- Regional Attention Based Deep Feature for Image Retrieval : [paper][review]
- Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination : [paper][review]
- Object retrieval with deep convolutional features : [paper][review]
- Cross-dimensional Weighting for Aggregated Deep Convolutional Features : [paper][review]
- Learning Embeddings for Product Visual Search with Triplet Loss and Online Sampling : [paper][review]
- Saliency Weighted Convolutional Features for Instance Search : [paper][review]
- 2018 Google Landmark Retrieval Challenge 리뷰 : [review]
- 2019 Google Landmark Retrieval Challenge 리뷰 : [review]
- REMAP: Multi-layer entropy-guided pooling of dense CNN features for image retrieval : [paper][review]
- Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset : [paper][review]
- Fine-tuning CNN Image Retrieval with No Human Annotation : [paper][review]
- Large Scale Landmark Recognition via Deep Metric Learning : [paper][review]
- Deep Aggregation of Regional Convolutional Activations for Content Based Image Retrieval : [paper][review]
- Challenging deep image descriptors for retrieval in heterogeneous iconographic collections : [paper][review]
- A Benchmark on Tricks for Large-scale Image Retrieval : [paper][review]
- Attention-Aware Generalized Mean Pooling for Image Retrieval : [paper][review]
- Class-Weighted Convolutional Features for Image Retrieval : [paper][review] # 100th
- deep image retrieval loss (계속 업데이트):[paper][review]
- Matchable Image Retrieval by Learning from Surface Reconstruction:[paper][review]
- Combination of Multiple Global Descriptors for Image Retrieval:[paper][review]
- Unifying Deep Local and Global Features for Efficient Image Search:[paper][review]
- ACTNET: end-to-end learning of feature activations and multi-stream aggregation for effective instance image retrieval:[paper][review]
- Google Landmarks Dataset v2 A Large-Scale Benchmark for Instance-Level Recognition and Retrieval:[paper][review]
- Detect-to-Retrieve: Efficient Regional Aggregation for Image Search:[paper][review]
- Local Features and Visual Words Emerge in Activations:[paper][review]
- Image Retrieval using Multi-scale CNN Features Pooling: [paper][review]
- MultiGrain: a unified image embedding for classes and instances: [paper][link_review] [link_review]
- Divide and Conquer the Embedding Space for Metric Learning: [paper][link_review]
- An Effective Pipeline for a Real-world Clothes Retrieval System: [paper][light_review]
- Instance Similarity Learning for Unsupervised Feature Representation : [paper]
- Towards Accurate Localization by Instance Search : [paper]
- The 2021 Image Similarity Dataset and Challenge : [paper]
- DOLG:Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features : [paper]
- Towards A Fairer Landmark Recognition Dataset : [paper]
- Recall@k Surrogate Loss with Large Batches and Similarity Mixup : [paper]
- Deep metric learning using Triplet network : [paper][review]
- FaceNet: A Unified Embedding for Face Recognition and Clustering : [paper][review]
- Sampling Matters in Deep Embedding Learning : [paper][review]
- Online Progressive Deep Metric Learning : [paper]
- Learning Embeddings for Product Visual Search with Triplet Loss and Online Sampling : [paper][review]
- Conditional Similarity Networks : [paper][review]
- Semi-supervised Feature-Level Attribute Manipulation for Fashion Image Retrieval : [paper][link_review]
- Context-Aware Visual Compatibility Prediction: [paper][review] [light_review]
- Learning Type-Aware Embeddings for Fashion Compatibility : [paper] [review]
- OutfitNet: Fashion Outfit Recommendation with Attention-Based Multiple Instance Learning : [paper]
- FashionNet: Personalized Outfit Recommendation with Deep Neural Network: [paper][review]
- Self-supervised Visual Attribute Learning for Fashion Compatibility : [paper]
- Personalized Outfit Recommendation with Learnable Anchors : [paper]
- PAI-BPR: Personalized Outfit Recommendation Scheme with Attribute-wise Interpretability : [paper]
- Hierarchical Fashion Graph Network for Personalized Outfit Recommendation : [paper]
- Kaleido-BERT: Vision-Language Pre-training on Fashion Domain : [paper]
- Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback : [paper]
- SHIFT15M: Multiobjective Large-Scale Fashion Dataset with Distributional Shifts : [paper]
- Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretrainingr : [paper]
- RP2K: A Large-Scale Retail Product Dataset for Fine-Grained Image Classification : [paper]
- eProduct: A Million-Scale Visual Search Benchmark to Address Product Recognition Challenges : [paper]
- Regional Maximum Activations of Convolutions with Attention for Cross-domain Beauty and Personal Care Product Retrieval:[paper][review]
- Learning visual similarity for product design with convolutional neural networks : [paper][review]
- Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretrainingr : [paper]
- he Met Dataset: Instance-level Recognition for Artworks : [paper]
- Deep Learning of Binary Hash Codes for Fast Image Retrieval : [paper][review]
- Feature Learning based Deep Supervised Hashing with Pairwise Labels : [paper][review]
- Deep Supervised Hashing with Triplet Labels : [paper][review]
- Online Hashing with Similarity Learning : [paper]
- NetVLAD: CNN architecture for weakly supervised place recognition : [paper][review]
- Learnable pooling with Context Gating for video classification : [paper][review]
- Less is More: Learning Highlight Detection from Video Duration : [paper][review]
- Efficient Video Classification Using Fewer Frames : [paper][review]
OCR - Recognition
- Synthetically Supervised Feature Learning for Scene Text Recognition : [paper][review]
- FOTS: Fast Oriented Text Spotting with a Unified Network : [paper][review]
- Robust Scene Text Recognition with Automatic Rectification : [paper][review]
- Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition : [paper]
OCR - Detection
- PixelLink: Detecting Scene Text via Instance Segmentation : [paper][review]
- EAST: An Efficient and Accurate Scene Text Detector : [paper][review]
- Scene Text Detection with Supervised Pyramid Context Network : [paper][review]
- FOTS: Fast Oriented Text Spotting with a Unified Network : [paper][review]
- Character Region Awareness for Text Detection : [paper][review]
- Squeeze Excitation Networks : [paper][review]
- Spatial Transformer Network : [paper][review]
- Tell Me Where to Look: Guided Attention Inference Network : [paper][review]
- CBAM: Convolutional Block Attention Module : [paper][review]
- BAM: Bottleneck Attention Module : [paper][review]
- Neural Machine Translation by Jointly Learning to Align and Translate : [paper][review]
- Residual Attention Networks for Image Classification : [paper][review]
- Attention is all you need : [paper][review][link_review]
- Residual Attention Network for Image Classification : [paper][review]
- Stand-Alone Self-Attention in Vision Models : [paper][review] [light_review] [light_review] [light_review] [light_review] [light_review]
- DeViSE: A Deep Visual-Semantic Embedding Model : [paper][review]
- Dual Attention Networks for Multimodal Reasoning and Matching : [paper][review]
- Learning Deep Structure-Preserving Image-Text Embeddings : [paper][review]
- Learning Two-Branch Neural Networks for Image-Text Matching Tasks : [paper] [link_review]
- Imagenet classification with deep convolutional neural networks : [paper][review]
- Going Deeper with Convolutions : [paper][review]
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices : [paper][review]
- Deep Residual Learning for Image Recognition : [paper][review]
- Aggregated Residual Transformations for Deep Neural Networks : [paper][review]
- Very Deep Convolutional Networks for Large-Scale Image Recognition : [paper][review]
- Squeeze Excitation Networks : [paper][review]
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications : [paper][review]
- Pelee: A Real-Time Object Detection System on Mobile Devices : [paper][review]
- Residual Attention Network for Image Classification : [paper][review]
- Wide Residual Networks : [paper][review]
- Stand-Alone Self-Attention in Vision Models : [paper][review]
- Selective Kernel Networks : [paper][review]
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks : [paper] [link_review]
- CSPNet: A New Backbone that can Enhance Learning Capability of CNN : [paper] [link_review] [link_review]
- RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition : [paper]
- Taskonomy: Disentangling Task Transfer Learning : [paper][link_review]
- What makes ImageNet good for transfer learning? : [paper][review]
- Generative Adversarial Nets : [paper][review]
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks : [paper][review]
- Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks : [paper][review]
- Progressive Growing of GANs for Improved Quality, Stability, and Variation : [paper][review]
- Beholder-GAN: Generation and Beautification of Facial Images with Conditioning on Their Beauty Level : [paper][review]
- Synthetically Supervised Feature Learning for Scene Text Recognition : [paper][review]
- A Style-Based Generator Architecture for Generative Adversarial Networks : [paper][review]
- High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs : [paper][review]
- Everybody Dance Now : [paper][review]
- Be Your Own Prada: Fashion Synthesis with Structural Coherence : [paper][review]
- Fashion-Gen: The Generative Fashion Dataset and Challenge : [paper][review]
- StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks : [paper][review]
- DwNet: Dense warp-based network for pose-guided human video generation: [paper][review]
- FaceNet: A Unified Embedding for Face Recognition and Clustering : [paper][review]
- The Devil of Face Recognition is in the Noise : [paper][link_review]
- Revisiting a single-stage method for face detection : [paper][review
- MixFaceNets: Extremely Efficient Face Recognition Networks : [paper]
- Efficient Estimation of Word Representations in Vector Space : [paper][review]
- node2vec: Scalable Feature Learning for Networks : [paper][review]
- Transfomer(self attention) 기본 이해하기 : PPT정리
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding : [paper][review](~ing)
- DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval : [paper][review]
- SNRM: From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing : [paper][review]
- TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank : [paper][review]
- ConvRankNet: Deep Neural Network for Learning to Rank Query-Text Pairs : [paper][review]
- KNRM: End-to-End Neural Ad-hoc Ranking with Kernel Pooling : [paper][review]
- Conv-KNRM: Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search : [paper][review]
- PACRR: A position-aware neural IR model for relevance matching : [paper][link_review]
- CEDR: Contextualized Embeddings for Document Ranking #262 : [paper][link]
- Deeper Text Understanding for IR with Contextual Neural Language Modeling : [paper][link]
- Simple Applications of BERT for Ad Hoc Document Retrieval : [paper][link]
- Document Expansion by Query Prediction : [paper][link]
- Passage Re-ranking with BERT : [paper][link]
- U-Net: Convolutional Networks for Biomedical Image Segmentation : [paper][review]
- Mask R-CNN : [paper][review]
- Fully Convolutional Networks for Semantic Segmentation : [paper][review]
- Cascade Decoder: A Universal Decoding Method for Biomedical Image Segmentation : [paper] [review]
- FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference : [paper] [link_review] [link_review]
- Path Aggregation Network for Instance Segmentation : [paper] [link_review]
- YOLO: Real-Time Object Detection : [paper][review]
- YOLO9000: Better, Faster, Stronger : [paper][review]
- YOLOv4: Optimal Speed and Accuracy of Object Detection : [paper] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review]
- Scaled-YOLOv4: Scaling Cross Stage Partial Network : [paper]
- Faster R-CNN : [paper][review]
- faster rcnn의 anchor generator 개념 뿐만 아니라 소스레벨에서도 이해하기 : [review]
- SSD: Single Shot MultiBox Detector : [paper][link_review]
- Why normalization performed only for conv4_3? : [review]
- Pelee: A Real-Time Object Detection System on Mobile Devices : [paper][review]
- R-FCN: Object Detection via Region-based Fully Convolutional Networks: [paper][review]
- Revisiting a single-stage method for face detection: [paper][review]
- DSSD : Deconvolutional Single Shot Detector: [paper][review]
- Feature-fused SSD: fast detection for small objects : [paper][link_review]
- EfficientDet : Scalable and Efficient Object Detection : [paper] [link_review] [review]
- FCOS: Fully Convolutional One-Stage Object Detection : [paper] [light_review]
- Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection : [paper] [light_review]
- Oriented R-CNN for Object Detection : [paper]
- CSPNet: A New Backbone that can Enhance Learning Capability of CNN : [paper] [link_review] [link_review]
- Learning Transferable Architectures for Scalable Image Recognition : [paper][link_review]
- Learning to Compose with Professional Photographs on the Web : [paper][review]
- Photo Aesthetics Ranking Network with Attributes and Content Adaptation : [paper][review]
- Composition-preserving Deep Photo Aesthetics Assessment : [paper][review]
- Deep Image Aesthetics Classification using Inception Modules and Fine-tuning Connected Layer : [paper][review]
- NIMA: Neural Image Assessment : [paper][review]
- Neural Arithmetic Logic Units : [paper][link_review]