Skip to content

Latest commit

 

History

History
33 lines (27 loc) · 1.4 KB

README.md

File metadata and controls

33 lines (27 loc) · 1.4 KB

Multiscale Vision Transformers

Haoqi Fan*, Bo Xiong*, Karttikeya Mangalam*, Yanghao Li*, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer*,
In arXiv, 2104.11227, 2021. [Paper]


Getting started

To use MViT-B models please refer to the configs under configs/Kinetics, or see the MODEL_ZOO.md for pre-trained models. See paper for details. For example, the command

python tools/run_net.py \
  --cfg configs/Kinetics/MVIT-B.yaml \
  DATA.PATH_TO_DATA_DIR path_to_your_dataset \

should train and test a MViT-B model on your dataset.

Citing MViT

If you find MViT useful for your research, please consider citing the paper using the following BibTeX entry.

@Article{mvit2021,
  author = {Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer},
  title = {Multiscale Vision Transformers},
  journal = {arXiv:2104.11227},
  Year = {2021},
}