Multiscale Vision Transformers

Haoqi Fan*, Bo Xiong*, Karttikeya Mangalam*, Yanghao Li*, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer*,
In arXiv, 2104.11227, 2021. [Paper]

Getting started

To use MViT-B models please refer to the configs under configs/Kinetics, or see the MODEL_ZOO.md for pre-trained models. See paper for details. For example, the command

python tools/run_net.py \
  --cfg configs/Kinetics/MVIT-B.yaml \
  DATA.PATH_TO_DATA_DIR path_to_your_dataset \

should train and test a MViT-B model on your dataset.

Citing MViT

If you find MViT useful for your research, please consider citing the paper using the following BibTeX entry.

@Article{mvit2021,
  author = {Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer},
  title = {Multiscale Vision Transformers},
  journal = {arXiv:2104.11227},
  Year = {2021},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Multiscale Vision Transformers

Getting started

Citing MViT

Files

README.md

Latest commit

History

README.md

File metadata and controls

Multiscale Vision Transformers

Getting started

Citing MViT