Official code release for Initializing Models with Larger Ones
Initializing Models with Larger Ones, ICLR 2024 (Spotlight)
Zhiqiu Xu, Yanjie Chen, Kirill Vishniakov, Yida Yin, Zhiqiang Shen, Trevor Darrell, Lingjie Liu, Zhuang Liu
University of Pennsylvania, UC Berkeley, MBZUAI, and Meta AI Research
We introduce weight selection, a method for initializing models by selecting a subset of weights from a pretrained larger model. With no extra cost, it is effective for improving the accuracy of a smaller model and reducing its training time needed to reach a certain accuracy level.
Please check INSTALL.md for installation instructions.
Please run weight_selection.py
first to obtain the initialization file. We obtain the pretrained model via timm 0.6.12. The name for --pretrained_model
might differ for a different timm version.
ViT-T initialization from ImageNet-21K pretrained ViT-S
python3 weight_selection.py \
--output_dir /path/to/weight_selection/ \
--model_type vit \
--pretrained_model vit_small_patch16_224_in21k
ConvNeXt-F initialization from ImageNet-21K pretrained ConvNeXt-T
python3 weight_selection.py \
--output_dir /path/to/weight_selection/ \
--model_type convnext \
--pretrained_model convnext_tiny_in22k
We list commands for training on ViT-T
and ConvNeXt-F
on CIFAR-100 and ImageNet.
- To run baseline (train from random initialization), remove
--initialize
command.
ViT-T from weight selection on CIFAR-100
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --warmup_epochs 50 --epochs 300 \
--batch_size 64 --lr 2e-3 --update_freq 1 --use_amp true \
--initialize /path/to/weight_selection \
--data_path /path/to/data/ \
--data_set CIFAR100 \
--output_dir /path/to/results/
ConvNeXt-F from weight selection on CIFAR-100
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model convnext_femto --warmup_epochs 50 --epochs 300 --drop_path 0.1 \
--batch_size 128 --lr 4e-3 --update_freq 1 --use_amp true \
--initialize /path/to/weight_selection \
--data_path /path/to/data/ \
--data_set CIFAR100 \
--output_dir /path/to/results/
ViT-T from weight selection on ImageNet-1K
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 4 --use_amp true \
--initialize /path/to/weight_selection \
--data_path /path/to/data/ \
--output_dir /path/to/results/
ConvNeXt-F from weight selection on ImageNet-1K
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 4 --use_amp true \
--initialize /path/to/weight_selection \
--data_path /path/to/data/ \
--output_dir /path/to/results/
CIFAR-100 and ImageNet-1K results of initializing ViT-T and ConvNeXt-F with weight selection from ImageNet-21K pretrained ViT-S and ConvNeXt-T
setting | ViT-T | ConvNeXt-F |
---|---|---|
train from random init (CIFAR-100) | 72.4 | 81.3 |
weight selection (CIFAR-100) | 81.4 | 84.4 |
train from random init (ImageNet-1K) | 73.9 | 76.1 |
weight selection (ImageNet-1K) | 75.6 | 76.4 |
This repository is built using the timm library and Dropout codebase.
If you find this repository helpful, please consider citing
@inproceedings{xu2024initializing,
title={Initializing Models with Larger Ones},
author={Zhiqiu Xu and Yanjie Chen and Kirill Vishniakov and Yida Yin and Zhiqiang Shen and Trevor Darrell and Lingjie Liu and Zhuang Liu},
year={2024},
booktitle={International Conference on Learning Representations (ICLR)},
}