Presenter: Andreas Steiner
Abstract:
With enough data and compute, Vision Transformers have achieved state of the art results on many vision tasks, replacing stacked convolutions with layers of self-attention.
This tutorial of 1.5 hours interleaves short presentations with practical sessions, where participants learn how to write ML models using JAX/Flax, how to write a simple Vision Transformer from scratch, and finally how to use code from the official Github repo to explore checkpoints, load them for inference, and perform fine-tuning on custom datasets.
- Presentation: vision_transformers.pdf
- Practical Session (Colab): vision_transformers.ipynb