ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Introduction

This repository contains the code, models, test results for the paper ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias. It contains several reduction cells and normal cells to introduce scale-invariance and locality into vision transformers.

Updates

07/12/2021 The code is released!

19/10/2021 The paper is accepted by Neurips'2021! The code will be released soon!

06/08/2021 The paper is post on arxiv! The code will be made public available once cleaned up.

Usage

Install

Clone this repo:

git clone https://github.com/Annbless/ViTAE.git
cd ViTAE

Create a conda virtual environment and activate it:

conda create -n vitae python=3.7 -y
conda activate vitae

conda install pytorch==1.8.1 torchvision==0.9.1 cudatoolkit=10.2 -c pytorch -c conda-forge

Install timm==0.3.4:

pip install timm==0.3.4

Install Apex:

git clone https://github.com/NVIDIA/apex
cd apex
git reset --hard a651e2c24ecf97cbf367fd3f330df36760e1c597
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Install other requirements:

pip install pyyaml ipdb

Data Prepare

We use standard ImageNet dataset, you can download it from http://image-net.org/. The file structure should look like:

$ tree data
imagenet
├── train
│   ├── class1
│   │   ├── img1.jpeg
│   │   ├── img2.jpeg
│   │   └── ...
│   ├── class2
│   │   ├── img3.jpeg
│   │   └── ...
│   └── ...
└── val
    ├── class1
    │   ├── img4.jpeg
    │   ├── img5.jpeg
    │   └── ...
    ├── class2
    │   ├── img6.jpeg
    │   └── ...
    └── ...

Evaluation

Take ViTAE_basic_7 as an example, to evaluate the pretrained ViTAE model on ImageNet val, run

python validate.py [ImageNetPath] --model ViTAE_basic_7 --eval_checkpoint [Checkpoint Path]

Training

Take ViTAE_basic_7 as an example, to train the ViTAE model on ImageNet with 4 GPU and 512 batch size, run

python -m torch.distributed.launch --nproc_per_node=4 main.py [ImageNetPath] --model ViTAE_basic_7 -b 128 --lr 1e-3 --weight-decay .03 --img-size 224 --amp

The trained model file will be saved under the output folder

Results

Main Results on ImageNet-1K with pretrained models

name	resolution	acc@1	acc@5	acc@RealTop-1	Pretrained
ViTAE-T	224x224	75.3	92.7	82.9	Coming Soon
ViTAE-6M	224x224	77.9	94.1	84.9	Coming Soon
ViTAE-13M	224x224	81.0	95.4	86.9	Coming Soon
ViTAE-S	224x224	82.0	95.9	87.0	Coming Soon

Statement

This project is for research purpose only. For any other questions please contact yufei.xu at outlook.com qmzhangzz at hotmail.com .

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Utils		Utils
dataset/meta		dataset/meta
figs		figs
models		models
.gitignore		.gitignore
README.md		README.md
main.py		main.py
utils.py		utils.py
validate.py		validate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Introduction

Updates

Usage

Install

Data Prepare

Evaluation

Training

Results

Main Results on ImageNet-1K with pretrained models

Statement

About

Releases

Packages

Languages

moonkkaabc/ViTAE

Folders and files

Latest commit

History

Repository files navigation

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Introduction

Updates

Usage

Install

Data Prepare

Evaluation

Training

Results

Main Results on ImageNet-1K with pretrained models

Statement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages