Chen Wei*, Haoqi Fan, Saining Xie, Chao-Yuan Wu, Alan Yuille, Christoph Feichtenhofer*
In CVPR, 2022. [Paper]
name | top1 | config pre-train (PT) | config fine-tune | model PT |
---|---|---|---|---|
ViT-B | 84.0 | in1k_VIT_B_MaskFeat_PT | in1k_VIT_B_MaskFeat_FT | link |
ViT-L | 85.7 | in1k_VIT_L_MaskFeat_PT | in1k_VIT_L_MaskFeat_FT | link |
name | frame length x sample rate | top1 | Flops (G) x views | #params (M) | config pre-train (PT) | config fine-tune | model PT |
---|---|---|---|---|---|---|---|
MViT-S | 16 x 4 | 82.2 | 71 x 1 x 10 | 36 | k400_MVITv2_S_16x4_MaskFeat_PT | k400_MVITv2_S_16x4_FT | link |
MViT-L | 16 x 4 | 84.3 | 377 x 1 x 10 | 218 | k400_MVITv2_L_16x4_MaskFeat_PT | k400_MVITv2_L_16x4_FT | link |
To use self-supervised learning techniques please refer to the configs under configs/masked_ssl
. For example, the command
python tools/run_net.py \
--cfg configs/masked_ssl/k400_MVITv2_L_16x4_MaskFeat_PT.yaml \
DATA.PATH_TO_DATA_DIR path_to_your_Kinetics_dataset
should train a MaskFeat MViT-L model on the Kinetics-400 dataset, and the command
python tools/run_net.py \
--cfg configs/masked_ssl/k400_MVITv2_L_16x4_FT.yaml \
DATA.PATH_TO_DATA_DIR path_to_your_Kinetics_dataset \
TRAIN.CHECKPOINT_FILE_PATH path_to_your_pretrain_checkpoint
will fine-tune the resulting model, after passing the checkpoint path to the config.
For images, the command
python tools/run_net.py \
--cfg configs/masked_ssl/in1k_VIT_B_MaskFeat_PT.yaml \
DATA.PATH_TO_DATA_DIR path_to_your_ImageNet_dataset
should train a MaskFeat ViT-B model on the ImageNet dataset, and the command
python tools/run_net.py \
--cfg configs/masked_ssl/in1k_VIT_B_FT.yaml \
DATA.PATH_TO_DATA_DIR path_to_your_ImageNet_dataset \
TRAIN.CHECKPOINT_FILE_PATH path_to_your_pretrain_checkpoint
will fine-tune the resulting model, after passing the checkpoint path to the config.
If you find this useful for your research, please consider citing the paper using the following BibTeX entry.
@InProceedings{wei2022masked,
author = {Wei, Chen and Fan, Haoqi and Xie, Saining and Wu, Chao-Yuan and Yuille, Alan and Feichtenhofer, Christoph},
title = {Masked Feature Prediction for Self-Supervised Visual Pre-Training},
booktitle = {CVPR},
year = {2022},
}