Official PyTorch implementation of DeBiFormer, from the following paper:
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention. ACCV 2024.
Nguyen Huu Bao Long, Chenyu Zhang, Yuzhi Shi, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, and Tohgoroh Matsui
- 2024-09-21: The paper has been accepted at ACCV 2024 !!!
name | resolution | acc@1 | #params | FLOPs | model | log |
---|---|---|---|---|---|---|
DeBiFormer-T | 224x224 | 81.9 | 21.4 M | 2.6 G | model | log |
DeBiFormer-S | 224x224 | 83.9 | 44 M | 5.4 G | model | log |
DeBiFormer-B | 224x224 | 84.4 | 77 M | 11.8 G | model | log |
First, clone the repository locally:
git clone https://github.com/maclong01/DeBiFormer.git
pip3 install -r requirements.txt
Download and extract ImageNet train and val images from http://image-net.org/.
The directory structure is the standard layout for the torchvision datasets.ImageFolder
, and the training and validation data is expected to be in the train/
folder and val/
folder respectively:
/path/to/imagenet/
train/
class1/
img1.jpeg
class2/
img2.jpeg
val/
class1/
img3.jpeg
class/2
img4.jpeg
To train DeBiFormer-S on ImageNet using 8 gpus for 300 epochs, run:
cd classification/
bash train.sh 8 --model debiformer_small --batch-size 256 --lr 5e-4 --warmup-epochs 20 --weight-decay 0.1 --data-path your_imagenet_path
To evaluate the performance of DeBiFormer-S on ImageNet using 8 gpus, run:
cd classification/
bash train.sh 8 --model debiformer_small --batch-size 256 --lr 5e-4 --warmup-epochs 20 --weight-decay 0.1 --data-path your_imagenet_path --resume ../checkpoints/debiformer_small_in1k_224.pth --eval
This repository is built using the timm library, DAT, and BiFormer repositories.
This project is released under the MIT license. Please see the LICENSE file for more information.
If you find this repository helpful, please consider citing:
@Article{baolong2024debiformer,
author = {NguyenHuu BaoLong and Chenyu Zhang and Yuzhi Shi and Takayoshi Yamashita and Tsubasa Hirakawa and Hironobu Fujiyoshi and Tohgoroh Matsui},
title = {DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention},
journal = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (ACCV)},
year = {2024},
}