Skip to content

chullhwan-song/Reading-Paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 

Repository files navigation

New Study : https://www.notion.so/Reading-Papers-Deep-Learning-504b50ddaed14360b34dfd6d49cb3455

Update 2024.01.09

Paper Review

  • 개인 공부라 열심히는 하고 있으나, 완벽한 리뷰가 아닙니다.
  • 리뷰가 끝나더라도 계속 의문/생각/교정/좋은자료가 있다면 꾸준히 업데이트 됩니다.
  • link review는 다른 분들이 하신 좋은 리뷰를 링크한 것입니다.
  • light_link는 빠르게 개념(abstract)정도로 본 논문을 의미.
  • 현재 상황이 리뷰 공개를 못하고 있는 상황이라, 논문 링크로만 정리진행합니다.

Virtual Try On [Link]

Asymmetric Image Retrieval [Link]

Diffusion [Link]

Deep Learning

Multi-Label Image Recognition

  • Learning Discriminative Representations for Multi-Label Image Recognition : [paper]

Knowledge distillation

  • Knowledge distillation: A good teacher is patient and consistent : [paper]
  • Hierarchical Self-supervised Augmented Knowledge Distillation : [paper]
  • Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation : [paper]

Vision and Language Pre-trained [Link]

CLIP & joined multi-modal

Efficient training & trick

Imbalance Datasets

Self Supervised Learninig & unsupervised learning & semi/Weakly supervised learning

  • Unsupervised Representation Learning by Predicting Image Rotations : [paper][]
  • Unsupervised Visual Representation Learning by Context Prediction : [paper][]
  • Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles : [paper][]
  • Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks : [paper][]
  • Rethinking Pre-training and Self-training : [paper][]
  • Selfie: Self-supervised Pretraining for Image Embedding : [paper] [light_review]
  • Self-training with Noisy Student improves ImageNet classification : [paper] [review]
  • SimCLR : A Simple Framework for Contrastive Learning of Visual Representations : [paper] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review]
  • SimCLR V2:Big Self-Supervised Models are Strong Semi-Supervised Learners : [paper]
  • MoCo : Momentum Contrast for Unsupervised Visual Representation Learning : [paper]
  • MoCo V2 : Improved Baselines with Momentum Contrastive Learning : [paper] [link_review] [link_review]
  • MoCo V3 : An Empirical Study of Training Self-Supervised Vision Transformers: [paper] [link_review] [link_review]
  • BYOL : Bootstrap your own latent: A new approach to self-supervised Learning: [paper]
  • Exploring the limits of weakly supervised pretraining : [paper]
  • Triplet is All You Need with Random Mappings for Unsupervised Visual Representation Learning : [paper]
  • ScatSimCLR: self-supervised contrastive learning with pretext task regularization for small-scale datasets : [paper]

Self Supervised Training + Mask based Token + Transformer

  • MST: Masked Self-Supervised Transformer for Visual Representation : [paper]
  • Masked Autoencoders Are Scalable Vision Learners : [paper]
  • SimMIM: A Simple Framework for Masked Image Modeling : [paper]

Self Supervised Training + Instance Image Retrival

  • InsCLR: Improving Instance Retrieval with Self-Supervision : [paper]

Vision Transformers classification

  • Stand-Alone Self-Attention in Vision Models : [paper][review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review]
  • Selfie: Self-supervised Pretraining for Image Embedding : [paper] [light_review] [link_review] [link_review]
  • ViT:An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: [paper] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review]
  • DeiT:Training data-efficient image transformers & distillation through attention : [paper] [link_review] [link_review] [link_review] [link_review]
  • Bottleneck Transformers for Visual Recognition: [paper] [link_review]
  • Going deeper with Image Transformers: [paper]
  • Rethinking Spatial Dimensions of Vision Transformers : [paper]
  • On the Adversarial Robustness of Visual Transformers: [paper]
  • TransFG: A Transformer Architecture for Fine-grained Recognition : [paper]
  • Understanding Robustness of Transformers for Image Classification : [paper]
  • DeepViT: Towards Deeper Vision Transformer : [paper]
  • CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification : [paper]
  • CvT: Introducing Convolutions to Vision Transformers: [paper] [link_review]
  • Efficient Feature Transformations for Discriminative and Generative Continual Learning : [paper]
  • Swin Transformer: Hierarchical Vision Transformer using Shifted Windows : [paper] [link_review] [link_review] [link_review]
  • Can Vision Transformers Learn without Natural Images?: [paper]
  • Scaling Local Self-Attention for Parameter Efficient Visual Backbones: [paper]
  • Incorporating Convolution Designs into Visual Transformers : [paper]
  • ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases : [paper]
  • Explicitly Modeled Attention Maps for Image Classification : [paper]
  • Conditional Positional Encodings for Vision Transformers : [paper]
  • Transformer in Transformer: [paper] [link_review]
  • A Survey on Visual Transformer: [paper]
  • Co-Scale Conv-Attentional Image Transformers: [paper]
  • Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity : [paper] [link_review]
  • LocalViT: Bringing Locality to Vision Transformers : [paper]
  • Visformer: The Vision-friendly Transformer : [paper]
  • Multiscale Vision Transformers : [paper] [link_review] [link_review]
  • So-ViT: Mind Visual Tokens for Vision Transformer: [paper]
  • Token Labeling: Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet (이후 "All Tokens Matter: Token Labeling for Training Better Vision Transformers 변경"): [paper]
  • Fourier Image Transformer: [paper]
  • Emerging Properties in Self-Supervised Vision Transformers: [paper]
  • ConTNet: Why not use convolution and transformer at the same time?: [paper]
  • Twins: Revisiting Spatial Attention Design in Vision Transformers: [paper]
  • MoCo V3 :An Empirical Study of Training Self-Supervised Vision Transformers: [paper] [link_review] [link_review]
  • Conformer: Local Features Coupling Global Representations for Visual Recognition: [paper]
  • Self-Supervised Learning with Swin Transformers: [paper]
  • Are Pre-trained Convolutions Better than Pre-trained Transformers?: [paper]
  • LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference: [paper]
  • Are Convolutional Neural Networks or Transformers more like human vision?: [paper]
  • Rethinking Skip Connection with Layer Normalization in Transformers and ResNets: [paper]
  • Rethinking the Design Principles of Robust Vision Transformer (Towards Robust Vision Transformer): [paper]
  • Longformer: The Long-Document Transformer : [paper] [link_review] [link_review] [link_review] [link_review]
  • Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding: [paper]
  • On the Robustness of Vision Transformers to Adversarial Examples: [paper]
  • Refiner: Refining Self-attention for Vision Transformers: [paper]
  • Patch Slimming for Efficient Vision Transformers: [paper]
  • RegionViT: Regional-to-Local Attention for Vision Transformers: [paper]
  • X-volution: On the unification of convolution and self-attention: [paper]
  • The Image Local Autoregressive Transformer: [paper]
  • Glance-and-Gaze Vision Transformer: [paper]
  • Semantic Correspondence with Transformers: [paper]
  • DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification: [paper]
  • When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations: [paper] [link_review]
  • KVT: k-NN Attention for Boosting Vision Transformers: [paper]
  • Less is More: Pay Less Attention in Vision Transformers: [paper]
  • FoveaTer: Foveated Transformer for Image Classification: [paper]
  • An Attention Free Transformer: [paper]
  • Not All Images are Worth 16x16 Words: Dynamic Vision Transformers with Adaptive Sequence Length: [paper]
  • Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks: [paper] [link_review]
  • Pre-Trained Image Processing Transformer: [paper] [link_review]
  • ResT: An Efficient Transformer for Visual Recognition: [paper]
  • Towards Robust Vision Transformer: [paper]
  • Aggregating Nested Transformers: [paper]
  • GasHis-Transformer: A Multi-scale Visual Transformer Approach for Gastric Histopathology Image Classification: [paper]
  • Intriguing Properties of Vision Transformers: [paper] [link_review] [link_review] [link_review]
  • Vision Transformers are Robust Learners: [paper]
  • Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer: [paper]
  • A Survey of Transformers: [paper]
  • Armour: Generalizable Compact Self-Attention for Vision Transformers : [paper]
  • Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer : [paper]
  • Dual-stream Network for Visual Recognition : [paper]
  • BEiT: BERT Pre-Training of Image Transformers : [paper]
  • Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions : [paper]
  • PVTv2: Improved Baselines with Pyramid Vision Transformer : [paper]
  • Thinking Like Transformers : [paper]
  • CMT: Convolutional Neural Networks Meet Vision Transformers : [paper] [link_review] [link_review]
  • Transformer with Peak Suppression and Knowledge Guidance for Fine-grained Image Recognition : [paper]
  • ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias : [paper]
  • Visual Transformer Pruning : [paper]
  • Local-to-Global Self-Attention in Vision Transformers : [paper]
  • Feature Fusion Vision Transformer for Fine-Grained Visual Categorization : [paper]
  • Vision Xformers: Efficient Attention for Image Classification : [paper]
  • EsViT : Efficient Self-supervised Vision Transformers for Representation Learning : [paper]
  • GLiT: Neural Architecture Search for Global and Local Image Transformer : [paper]
  • Efficient Vision Transformers via Fine-Grained Manifold Distillation : [paper]
  • What Makes for Hierarchical Vision Transformer? : [paper]
  • AutoFormer: Searching Transformers for Visual Recognition : [paper]
  • Focal Self-attention for Local-Global Interactions in Vision Transformers : [paper] [link_review]
  • ConvNets vs. Transformers: Whose Visual Representations are More Transferable? : [paper]
  • Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight : [paper]
  • Mobile-Former: Bridging MobileNet and Transformer : [paper]
  • Image Fusion Transformer : [paper]
  • PSViT: Better Vision Transformer via Token Pooling and Attention Sharing : [paper]
  • Do Vision Transformers See Like Convolutional Neural Networks? : [paper]
  • Linformer: Self-Attention with Linear Complexity : [paper] [link_review]
  • CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows : [paper]
  • How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers : [paper]
  • Searching for Efficient Multi-Stage Vision Transformers : [paper]
  • Exploring and Improving Mobile Level Vision Transformers : [paper]
  • XCiT: Cross-Covariance Image Transformers : [paper]
  • Efficient Vision Transformers via Fine-Grained Manifold Distillation : [paper]
  • Scaled ReLU Matters for Training Vision Transformers : [paper]
  • VOLO: Vision Outlooker for Visual Recognition : [paper]
  • CoAtNet: Marrying Convolution and Attention for All Data Sizes : [paper] [link_review] [link_review]
  • MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer : [paper]
  • A free lunch from ViT: Adaptive Attention Multi-scale Fusion Transformer for Fine-grained Visual Recognition : [paper]
  • Improved Multiscale Vision Transformers for Classification and Detection : [paper]

Vision Transformers positional embedding

  • Self-Attention with Relative Position Representations : [paper] [link_review]
  • Vision Transformer with Progressive Sampling : [paper]
  • DPT: Deformable Patch-based Transformer for Visual Recognition : [paper]
  • CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings : [paper]
  • Rethinking and Improving Relative Position Encoding for Vision Transformer : [paper]
  • Rethinking Positional Encoding : [paper]
  • Relative Positional Encoding for Transformers with Linear Complexity : [paper]
  • Conditional Positional Encodings for Vision Transformers : [paper]
  • Pyramid Adversarial Training Improves ViT Performance : [paper]
  • Shunted Self-Attention via Multi-Scale Token Aggregation : [paper]
  • AdaViT: Adaptive Vision Transformers for Efficient Image Recognition : [paper]
  • ATS: Adaptive Token Sampling For Efficient Vision Transformers : [paper]
  • Global Interaction Modelling in Vision Transformer via Super Tokens : [paper]

Vision Transformers vs MLP (or Others)

  • AS-MLP: An Axial Shifted MLP Architecture for Vision : [paper]
  • S2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision : [paper]
  • ResMLP: Feedforward networks for image classification with data-efficient training: [paper]
  • Pay Attention to MLPs: [paper]
  • Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet: [paper]
  • MLP-Mixer: An all-MLP Architecture for Vision : [paper]
  • Sparse-MLP: A Fully-MLP Architecture with Conditional Computation : [paper]
  • ConvMLP: Hierarchical Convolutional MLPs for Vision : [paper]
  • Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition : [paper]
  • MetaFormer is Actually What You Need for Vision : [paper]
  • Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers : [paper] *

Vision Transformers retrieval

  • Investigating the Vision Transformer Model for Image Retrieval Tasks: [paper]
  • Training Vision Transformers for Image Retrieval: [paper]
  • Instance-level Image Retrieval using Reranking Transformers: [paper]
  • Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval: [paper]
  • TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval : [paper]
  • Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations : [paper]
  • Vision Transformer Hashing for Image Retrieval : [paper]

Vision Transformers segmentation and detection

  • CoSformer: Detecting Co-Salient Object with Transformers: [paper]
  • MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding: [paper]
  • Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks: [paper]
  • Medical Image Segmentation Using Squeeze-and-Expansion Transformers: [paper]
  • SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers: [paper]
  • Visual Transformers: Token-based Image Representation and Processing for Computer Vision : [paper][]
  • DETR:End-to-End Object Detection with Transformers : [paper] [link_review] [link_review] [link_review] [link_review] [link_review]
  • Unifying Global-Local Representations in Salient Object Detection with Transformer : [paper]
  • A Unified Efficient Pyramid Transformer for Semantic Segmentation : [paper]
  • Dual-stream Network for Visual Recognition : [paper]
  • MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers : [paper]
  • Vision Transformers with Patch Diversification : [paper]
  • Improve Vision Transformers Training by Suppressing Over-smoothing : [paper]
  • SOTR: Segmenting Objects with Transformers : [paper]
  • Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer : [paper]
  • Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers : [paper]
  • Unifying Global-Local Representations in Salient Object Detection with Transformer : [paper]
  • Conditional DETR for Fast Training Convergence : [paper]
  • Fully Transformer Networks for Semantic Image Segmentation : [paper]
  • Segmenter: Transformer for Semantic Segmentation : [paper]
  • nnFormer: Interleaved Transformer for Volumetric Segmentation : [paper]
  • Benchmarking Detection Transfer Learning with Vision Transformers : [paper]

Vision Transformers video

  • An Image is Worth 16x16 Words, What is a Video Worth?: [paper]
  • Token Shift Transformer for Video Classification : [paper]

Vision Transformers face

  • Robust Facial Expression Recognition with Convolutional Visual Transformers : [paper]
  • Learning Vision Transformer with Squeeze and Excitation for Facial Expression Recognition : [paper]

Vision Transformers OCR

  • NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition : [paper][]
  • On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention : [paper][]
  • 2D Attentional Irregular Scene Text Recognizer : [paper][]

Vision Transformers multi-modal

  • ReFormer: The Relational Transformer for Image Captioning : [paper]
  • Long-Short Transformer: Efficient Transformers for Language and Vision : [paper]

Vision Transformers GAN

  • A Hierarchical Transformation-Discriminating Generative Model for Few Shot Anomaly Detection : [paper]
  • ViTGAN: Training GANs with Vision Transformers : [paper]
  • Styleformer: Transformer based Generative Adversarial Networks with Style Vector : [paper]
  • Combining Transformer Generators with Convolutional Discriminators : [paper]

Facebook AI Image Similarity Challenge

  • 3rd Place: A Global and Local Dual Retrieval Solution to Facebook AI Image Similarity Challenge : [paper]

Google Landmark Challenge

Image Retrieval (Instance level Image Retrieval) & Deep Feature

  • (My paper) All the attention you need: Global-local, spatial-channel attention for image retrieval : [paper]
  • Large-Scale Image Retrieval with Attentive Deep Local Features : [paper] [review]
  • NetVLAD: CNN architecture for weakly supervised place recognition : [paper][review]
  • Learning visual similarity for product design with convolutional neural networks : [paper][review]
  • Bags of Local Convolutional Features for Scalable Instance Search : [paper][review]
  • Neural Codes for Image Retrieval : [paper][review]
  • Conditional Similarity Networks : [paper][review]
  • End-to-end Learning of Deep Visual Representations for Image Retrieval : [paper][review]
  • CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples : [paper][review]
  • Image similarity using Deep CNN and Curriculum Learning : [paper][review]
  • Faster R-CNN Features for Instance Search : [paper][review]
  • Regional Attention Based Deep Feature for Image Retrieval : [paper][review]
  • Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination : [paper][review]
  • Object retrieval with deep convolutional features : [paper][review]
  • Cross-dimensional Weighting for Aggregated Deep Convolutional Features : [paper][review]
  • Learning Embeddings for Product Visual Search with Triplet Loss and Online Sampling : [paper][review]
  • Saliency Weighted Convolutional Features for Instance Search : [paper][review]
  • 2018 Google Landmark Retrieval Challenge 리뷰 : [review]
  • 2019 Google Landmark Retrieval Challenge 리뷰 : [review]
  • REMAP: Multi-layer entropy-guided pooling of dense CNN features for image retrieval : [paper][review]
  • Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset : [paper][review]
  • Fine-tuning CNN Image Retrieval with No Human Annotation : [paper][review]
  • Large Scale Landmark Recognition via Deep Metric Learning : [paper][review]
  • Deep Aggregation of Regional Convolutional Activations for Content Based Image Retrieval : [paper][review]
  • Challenging deep image descriptors for retrieval in heterogeneous iconographic collections : [paper][review]
  • A Benchmark on Tricks for Large-scale Image Retrieval : [paper][review]
  • Attention-Aware Generalized Mean Pooling for Image Retrieval : [paper][review]
  • Class-Weighted Convolutional Features for Image Retrieval : [paper][review] # 100th
  • deep image retrieval loss (계속 업데이트):[paper][review]
  • Matchable Image Retrieval by Learning from Surface Reconstruction:[paper][review]
  • Combination of Multiple Global Descriptors for Image Retrieval:[paper][review]
  • Unifying Deep Local and Global Features for Efficient Image Search:[paper][review]
  • ACTNET: end-to-end learning of feature activations and multi-stream aggregation for effective instance image retrieval:[paper][review]
  • Google Landmarks Dataset v2 A Large-Scale Benchmark for Instance-Level Recognition and Retrieval:[paper][review]
  • Detect-to-Retrieve: Efficient Regional Aggregation for Image Search:[paper][review]
  • Local Features and Visual Words Emerge in Activations:[paper][review]
  • Image Retrieval using Multi-scale CNN Features Pooling: [paper][review]
  • MultiGrain: a unified image embedding for classes and instances: [paper][link_review] [link_review]
  • Divide and Conquer the Embedding Space for Metric Learning: [paper][link_review]
  • An Effective Pipeline for a Real-world Clothes Retrieval System: [paper][light_review]
  • Instance Similarity Learning for Unsupervised Feature Representation : [paper]
  • Towards Accurate Localization by Instance Search : [paper]
  • The 2021 Image Similarity Dataset and Challenge : [paper]
  • DOLG:Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features : [paper]
  • Towards A Fairer Landmark Recognition Dataset : [paper]
  • Recall@k Surrogate Loss with Large Batches and Similarity Mixup : [paper]

Metric Learning

Fashion Image Retrieval

  • Learning Embeddings for Product Visual Search with Triplet Loss and Online Sampling : [paper][review]
  • Conditional Similarity Networks : [paper][review]
  • Semi-supervised Feature-Level Attribute Manipulation for Fashion Image Retrieval : [paper][link_review]

Fashion Compatibility & Outfit Recommendation

Personalized Outfit Recommendation & fashion outfit

  • FashionNet: Personalized Outfit Recommendation with Deep Neural Network: [paper][review]
  • Self-supervised Visual Attribute Learning for Fashion Compatibility : [paper]
  • Personalized Outfit Recommendation with Learnable Anchors : [paper]
  • PAI-BPR: Personalized Outfit Recommendation Scheme with Attribute-wise Interpretability : [paper]
  • Hierarchical Fashion Graph Network for Personalized Outfit Recommendation : [paper]

Fashion multi-modal

  • Kaleido-BERT: Vision-Language Pre-training on Fashion Domain : [paper]
  • Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback : [paper]

Fashion DataSets

  • SHIFT15M: Multiobjective Large-Scale Fashion Dataset with Distributional Shifts : [paper]

Retail & Product & Instance

  • Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretrainingr : [paper]
  • RP2K: A Large-Scale Retail Product Dataset for Fine-Grained Image Classification : [paper]
  • eProduct: A Million-Scale Visual Search Benchmark to Address Product Recognition Challenges : [paper]
  • Regional Maximum Activations of Convolutions with Attention for Cross-domain Beauty and Personal Care Product Retrieval:[paper][review]
  • Learning visual similarity for product design with convolutional neural networks : [paper][review]
  • Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretrainingr : [paper]
  • he Met Dataset: Instance-level Recognition for Artworks : [paper]

Image Retrieval using Deep Hash

  • Deep Learning of Binary Hash Codes for Fast Image Retrieval : [paper][review]
  • Feature Learning based Deep Supervised Hashing with Pairwise Labels : [paper][review]
  • Deep Supervised Hashing with Triplet Labels : [paper][review]
  • Online Hashing with Similarity Learning : [paper]

Video Classification

  • NetVLAD: CNN architecture for weakly supervised place recognition : [paper][review]
  • Learnable pooling with Context Gating for video classification : [paper][review]
  • Less is More: Learning Highlight Detection from Video Duration : [paper][review]
  • Efficient Video Classification Using Fewer Frames : [paper][review]

OCR - Recognition

  • Synthetically Supervised Feature Learning for Scene Text Recognition : [paper][review]
  • FOTS: Fast Oriented Text Spotting with a Unified Network : [paper][review]
  • Robust Scene Text Recognition with Automatic Rectification : [paper][review]
  • Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition : [paper]

OCR - Detection

Attention & Deformation

Visual & Textual Embedding

CNN

Transfer Learning

Generative Adversarial Nets

  • Generative Adversarial Nets : [paper][review]
  • Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks : [paper][review]
  • Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks : [paper][review]
  • Progressive Growing of GANs for Improved Quality, Stability, and Variation : [paper][review]
  • Beholder-GAN: Generation and Beautification of Facial Images with Conditioning on Their Beauty Level : [paper][review]
  • Synthetically Supervised Feature Learning for Scene Text Recognition : [paper][review]
  • A Style-Based Generator Architecture for Generative Adversarial Networks : [paper][review]
  • High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs : [paper][review]
  • Everybody Dance Now : [paper][review]
  • Be Your Own Prada: Fashion Synthesis with Structural Coherence : [paper][review]
  • Fashion-Gen: The Generative Fashion Dataset and Challenge : [paper][review]
  • StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks : [paper][review]
  • DwNet: Dense warp-based network for pose-guided human video generation: [paper][review]

Face

  • FaceNet: A Unified Embedding for Face Recognition and Clustering : [paper][review]
  • The Devil of Face Recognition is in the Noise : [paper][link_review]
  • Revisiting a single-stage method for face detection : [paper][review
  • MixFaceNets: Extremely Efficient Face Recognition Networks : [paper]

Pose Estimation

NLP/NLU

  • Efficient Estimation of Word Representations in Vector Space : [paper][review]
  • node2vec: Scalable Feature Learning for Networks : [paper][review]
  • Transfomer(self attention) 기본 이해하기 : PPT정리
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding : [paper][review](~ing)
  • DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval : [paper][review]
  • SNRM: From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing : [paper][review]
  • TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank : [paper][review]
  • ConvRankNet: Deep Neural Network for Learning to Rank Query-Text Pairs : [paper][review]
  • KNRM: End-to-End Neural Ad-hoc Ranking with Kernel Pooling : [paper][review]
  • Conv-KNRM: Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search : [paper][review]
  • PACRR: A position-aware neural IR model for relevance matching : [paper][link_review]
  • CEDR: Contextualized Embeddings for Document Ranking #262 : [paper][link]
  • Deeper Text Understanding for IR with Contextual Neural Language Modeling : [paper][link]
  • Simple Applications of BERT for Ad Hoc Document Retrieval : [paper][link]
  • Document Expansion by Query Prediction : [paper][link]
  • Passage Re-ranking with BERT : [paper][link]

Domain Adaptation

Curriculum Learning

  • CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images : [paper][review]

Image Segmentation

Localization

AutoML

Image Quality

  • Learning to Compose with Professional Photographs on the Web : [paper][review]
  • Photo Aesthetics Ranking Network with Attributes and Content Adaptation : [paper][review]
  • Composition-preserving Deep Photo Aesthetics Assessment : [paper][review]
  • Deep Image Aesthetics Classification using Inception Modules and Fine-tuning Connected Layer : [paper][review]
  • NIMA: Neural Image Assessment : [paper][review]

Others

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published