Skip to content

Saksham0109/VisionTransformer

Repository files navigation

VisionTransformer

This repository contains all code i wrote or used while reading up about Vision Transformers using the research paper: https://arxiv.org/abs/2010.11929

AnImageIsWorth16x16Words

This jupyter notebook contains the code for the model that i wrote alongside explainations.

PretrainedViT

This imports a pretrained(on ImageNet) ViT model and applies it on a video input giving an output of a video which contains the classification as text.

vid.mp4 and video.mp4 is an example of the input and output video

PretrainedModel

This is code for a pretrained model which i made changes to in order to give weights alongside the output

VisualisingAttention

This file uses the PretrainedModel to give the attention layer as an image.This helps us see which part of the image was under focus by the model.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published