Reversible Vision Transformers

Official PyTorch implementation of Rev-ViT and Rev-MViT models, from the following paper:
Reversible Vision Transformers Karttikeya Mangalam*, Haoqi Fan, Yanghao Li, Chao-Yuan Wu, Bo Xiong, Christoph Feichtenhofer*, Jitendra Malik CVPR 2022 (Oral)

Project Homepage: https://karttikeya.github.io/publication/revvit/

Pretrained Models

ImageNet

Architecture	#params (M)	FLOPs (G)	Top1	weights	Config
Rev-ViT-S	22	4.6	79.9	`link`	ImageNet/REV_VIT_S
Rev-ViT-B	87	17.6	81.8	`link`	ImageNet/REV_VIT_B
Rev-MViT-B	39	8.7	82.9*	`link`	ImageNet/REV_MVIT_B_16_CONV

*improved from 82.5% reported in the Paper Table 1.

Kinetics 400

Architecture	frame length x sample rate	Top1	Top5	Flops (G) x views	#params (M)	weights	config
Rev-MViT-B	16 x 4	78.4	93.4	64 x 1 x 5	34.9	`link`	Kinetics/REV_MVIT_B_16x4_CONV

Getting started

To use Rev-ViT (or Rev-MViT) image models please refer to the configs under configs/ImageNet, and see paper for details. For example, the command

python tools/run_net.py \

--cfg configs/ImageNet/REV_VIT_B.yaml \

DATA.PATH_TO_DATA_DIR path_to_your_dataset \
NUM_GPUS 1 \

should train and test a Reversible ViT Base Image model (trained with DeiT recipe) on your dataset. Please refer to general repo-level instruction for futher details.

For Rev-MViT video models, please run the configs/Kinetics configs as,


python tools/run_net.py \

--cfg configs/Kinetics/REV_MVIT_B_16x4_CONV.yaml \

DATA.PATH_TO_DATA_DIR path_to_your_dataset \
NUM_GPUS 1 \

Citing Rev-ViT & Rev-MViT

If you find Reversible Models useful for your research, please consider citing the paper using the following BibTeX entry.

@inproceedings{mangalam2022,

  title = {Reversible Vision Transformers},

  author = {Mangalam, Karttikeya and Fan, Haoqi and Li, Yanghao and Wu, Chao-Yuan and Xiong, Bo and Feichtenhofer, Christoph and Malik, Jitendra},

  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},

  year = {2022},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!