🚀 These all models are trained for 300 epochs on ImageNet-1K with the same training recipe as DeiT and CoaT.
model | resolution | acc@1 | #params | FLOPs | weight |
---|---|---|---|---|---|
MPViT-T | 224x224 | 78.2 | 5.8M | 1.6G | weight |
MPViT-XS | 224x224 | 80.9 | 10.5M | 2.9G | weight |
MPViT-S | 224x224 | 83.0 | 22.8M | 4.7G | weight |
MPViT-B | 224x224 | 84.3 | 74.8M | 16.4G | weight |
Please refer to DeiT for data preparation.
We recommend to symlink the path(data/ImageNet
) to the ImageNet dataset path to data/
as follows
# symlink the ImageNet dataset
cd MPViT
mkdir data
ln -s /path_to_imagenet_dataset data/ImageNet
Check the data/ImageNet
path points to /path_to_imagenet
correctly.
# ImageNet -> /path_to_imagenet_dataset
ll data
We use Pytorch==1.7.0
, torchvision==0.8.1
, cuda==10.1
einops
and pytorch-image-models (timm
).
Please refer to the official pytorch installation guide.
# Install Pytorch and torchvision
conda install pytorch==1.7.0 torchvision==0.8.1 -c pytorch
# Install timm
pip install timm
# Install einops
pip install einops
To evaluate a pre-trained MPViT on ImageNet val with a single GPU, run:
MPViT-base
:
python main.py --eval --resume https://dl.dropbox.com/s/la8w31m0apj2830/mpvit_base.pth --model mpvit_base
This should give
* Acc@1 84.292 Acc@5 96.820 loss 0.916
Accuracy of the network on the 50000 test images: 84.3%
Here you'll find the command-lines to reproduce the inference results for MPViT models.
MPViT-Tiny
python main.py --eval --resume https://dl.dropbox.com/s/1cmquqyjmaeeg1n/mpvit_tiny.pth --model mpvit_tiny
giving
* Acc@1 78.210 Acc@5 94.270 loss 1.112
Accuracy of the network on the 50000 test images: 78.2%
MPViT-Xsmall
python main.py --eval --resume https://dl.dropbox.com/s/vvpq2m474g8tvyq/mpvit_xsmall.pth --model mpvit_xsmall
giving
* Acc@1 80.896 Acc@5 95.352 loss 1.039
Accuracy of the network on the 50000 test images: 80.9%
MPViT-Small
python main.py --eval --resume https://dl.dropbox.com/s/y3dnmmy8h4npz7a/mpvit_small.pth --model mpvit_small
giving
* Acc@1 82.960 Acc@5 96.154 loss 0.893
Accuracy of the network on the 50000 test images: 83.0%
To train MPViT models on ImageNet on a single node with 8 gpus for 300 epochs, run:
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> main.py \
--model <mpvit_model> --data-path <imagenet-path> --batch-size <batch-size-per-gpu> --output <output-directory>
MPViT-Tiny
:
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
--model mpvit_tiny --batch-size 128 --data-path <imagenet-path> --output_dir <output-directory>
MPViT-Xsmall
:
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
--model mpvit_xsmall --batch-size 128 --data-path <imagenet-path> --output_dir <output-directory>
MPViT-Small
:
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
--model mpvit_small --batch-size 128 --drop-path 0.05 --data-path <imagenet-path> --output_dir <output-directory>
MPViT-Base
:
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
--model mpvit_base --batch-size 128 --drop-path 0.3 --data-path <imagenet-path> --output_dir <output-directory>