A curated list of vision transformer related resources. Please feel free to pull requests or open an issue to add papers.
Title | Venue | BibTeX |
---|---|---|
A Survey on Visual Transformer | ArXiv | Bib |
Task | Reg | Det | Seg | Trk | Other |
---|---|---|---|---|---|
Explanation | Image Recoginition | Object Detection | Image Segmentation | Object Tracking | other types |
You can add a tag for domains
which contains several transformer-based works
(Pls follow Time Inverse Ranking)
Title | Venue | Task | Code | BibTeX |
---|---|---|---|---|
[T2T-ViT]Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet | Arxiv | Reg | GitHub | Bib |
[BoTNet]Bottleneck Transformers for Visual Recognition | Arxiv | Reg | GitHub | Bib |
[SSTVOS]SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation | Arxiv | Seg | --- | Bib |
[TrackFormer]TrackFormer: Multi-Object Tracking with Transformers | Arxiv | Trk | --- | Bib |
Title | Venue | Task | Code | BibTeX |
---|---|---|---|---|
[DeiT]Training data-efficient image transformers & distillation through attention | ArXiv | Reg | GitHub | Bib |
[ViT]An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale | ICLR | Reg | GitHub | Bib |
[ViT-FRCNN] Toward Transformer-Based Object Detection | ArXiv | Det | --- | Bib |
[TSP-FOCS] Rethinking Transformer-based Set Prediction for Object Detection | ArXiv | Det | --- | Bib |
[UP-DETR] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers | ArXiv | Det | --- | Bib |
[Deformable DETR] Deformable DETR: Deformable Transformers for End-to-End Object Detection | ArXiv | Det | GitHub | Bib |
[DETR] End-to-End Object Detection with Transformers | ECCV | Det | GitHub | Bib article{zhu2020deformable, |
[SETR]Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers | Arxiv | Seg | Github | Bib @article{zheng2020rethinking, |
[MaX-DeepLab]MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers | Arxiv | Seg | --- | Bib @article{wang2020max, |
[TransTrack]TransTrack: Multiple-Object Tracking with Transformer | ArXiv | Trk | GitHub | Bib |
Title | Venue | Task | Code | BibTeX |
---|---|---|---|---|
[SASA] Stand-Alone Self-Attention in Vision Models | ArXiv | Reg | - | - |
Title | Venue | Task | Code | BibTeX |
---|---|---|---|---|
Attention Is All You Need | NeurIPS'17 | -- | GitHub | Bib @inproceedings{vaswani2017attention, |