Official PyTorch implementation of DeBiFormer, from the following paper:
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention. ACCV 2024.
Nguyen Huu Bao Long, Chenyu Zhang, Yuzhi Shi, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, and Tohgoroh Matsui
- 2024-09-21: The paper has been accepted at ACCV 2024 !!!
name | resolution | acc@1 | #params | FLOPs | model | log |
---|---|---|---|---|---|---|
DeBiFormer-T | 224x224 | 81.9 | 21.4 M | 2.6 G | model | log |
DeBiFormer-S | 224x224 | 83.9 | 44 M | 5.4 G | model | log |
DeBiFormer-B | 224x224 | 84.4 | 77 M | 11.8 G | model | log |
First, clone the repository locally:
git clone https://github.com/maclong01/DeBiFormer.git
pip3 install -r requirements.txt
Download and extract ImageNet train and val images from http://image-net.org/.
The directory structure is the standard layout for the torchvision datasets.ImageFolder
, and the training and validation data is expected to be in the train/
folder and val/
folder respectively:
/path/to/imagenet/
train/
class1/
img1.jpeg
class2/
img2.jpeg
val/
class1/
img3.jpeg
class/2
img4.jpeg
To train DeBiFormer-S on ImageNet using 8 gpus for 300 epochs, run:
cd classification/
bash train.sh 8 --model debiformer_small --batch-size 256 --lr 5e-4 --warmup-epochs 20 --weight-decay 0.1 --data-path your_imagenet_path
To evaluate the performance of DeBiFormer-S on ImageNet using 8 gpus, run:
cd classification/
bash train.sh 8 --model debiformer_small --batch-size 256 --lr 5e-4 --warmup-epochs 20 --weight-decay 0.1 --data-path your_imagenet_path --resume ../checkpoints/debiformer_small_in1k_224.pth --eval
This repository is built using the timm library, DAT, and BiFormer repositories.
This project is released under the MIT license. Please see the LICENSE file for more information.
If you find this repository helpful, please consider citing:
@InProceedings{BaoLong_2024_ACCV,
author = {BaoLong, NguyenHuu and Zhang, Chenyu and Shi, Yuzhi and Hirakawa, Tsubasa and Yamashita, Takayoshi and Matsui, Tohgoroh and Fujiyoshi, Hironobu},
title = {DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention},
booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)},
month = {December},
year = {2024},
pages = {4455-4472}
}