VisionTransformer

This repository contains all code i wrote or used while reading up about Vision Transformers using the research paper: https://arxiv.org/abs/2010.11929

AnImageIsWorth16x16Words

This jupyter notebook contains the code for the model that i wrote alongside explainations.

PretrainedViT

This imports a pretrained(on ImageNet) ViT model and applies it on a video input giving an output of a video which contains the classification as text.

vid.mp4 and video.mp4 is an example of the input and output video

PretrainedModel

This is code for a pretrained model which i made changes to in order to give weights alongside the output

VisualisingAttention

This file uses the PretrainedModel to give the attention layer as an image.This helps us see which part of the image was under focus by the model.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
PretrainedModel		PretrainedModel
AnImageIsWorth16x16Words.ipynb		AnImageIsWorth16x16Words.ipynb
PretrainedViT.ipynb		PretrainedViT.ipynb
README.md		README.md
VisualisingAttention.ipynb		VisualisingAttention.ipynb
img.png		img.png
img2.png		img2.png
vid.mp4		vid.mp4
video.mp4		video.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VisionTransformer

AnImageIsWorth16x16Words

PretrainedViT

PretrainedModel

VisualisingAttention

About

Uh oh!

Releases

Packages

Languages

Saksham0109/VisionTransformer

Folders and files

Latest commit

History

Repository files navigation

VisionTransformer

AnImageIsWorth16x16Words

PretrainedViT

PretrainedModel

VisualisingAttention

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages