PolaFormer: Polarity-aware Linear Attention for Vision Transformers [ICLR 2025]

If you like our works, please support us with your stars⭐!

🚀 Welcome to the repo of PolaFormer!

This repo contains the official PyTorch code and pre-trained models for PolaFormer.

🔥 News

[2/4] 🔥 The triton implementation of PolaFormer is released thanks to fbi_la library
[1/22] 🔥 Our paper has been accepted by The International Conference on Learning Representations (ICLR), 2025.

Introduction

Motivation

Linear attention has emerged as a promising alternative to softmax-based attention, leveraging kernelized feature maps to reduce complexity from $\mathcal{O}(N^2)$ to $\mathcal{O}(N)$ in sequence length. However, the non-negative constraint on feature maps and the relaxed exponential function used in approximation lead to significant information loss compared to the original query-key dot products, resulting in less discriminative attention maps with higher entropy. To address the missing interactions driven by negative values in query-key pairs and the high entropy, we propose the PolaFormer, which achieves a superior balance between expressive capability and efficiency.

Method

In this paper, we propose the polarity-aware linear attention mechanism that explicitly models both same-signed and opposite-signed query-key interactions, ensuring comprehensive coverage of relational information. Furthermore, to restore the spiky properties of attention maps, we prove that the existence of a class of element-wise functions (with positive first and second derivatives) can reduce entropy in the attention distribution. Finally, we employ a learnable power function for rescaling, allowing strong and weak attention signals to be effectively separated.

Notably, we introduce two learnable polarity-aware coefficients matrices applied with element-wise multiplication, which are expected to learn the complementary relationship between same-signed and opposite-signed values.

Results

Comparison of different models on ImageNet-1K.

Performance on Long Range Arena benchmark.

Model	Text	ListOps	Retrieval	Pathfinder	Image	Average
$\text{PolaFormer}_{\alpha=3}$	73.06	37.35	80.50	70.53	42.15	60.72
$\text{PolaFormer}_{\alpha=5}$	72.33	38.76	80.37	68.98	41.91	60.47
$\text{PolaFormer}_{\alpha=7}$	71.93	37.60	81.47	69.09	42.77	60.57

Dependencies

Python 3.9
PyTorch == 1.11.0
torchvision == 0.12.0
numpy
timm == 0.4.12
einops
yacs

Data preparation

The ImageNet dataset should be prepared as follows:

$ tree data
imagenet
├── train
│   ├── class1
│   │   ├── img1.jpeg
│   │   ├── img2.jpeg
│   │   └── ...
│   ├── class2
│   │   ├── img3.jpeg
│   │   └── ...
│   └── ...
└── val
    ├── class1
    │   ├── img4.jpeg
    │   ├── img5.jpeg
    │   └── ...
    ├── class2
    │   ├── img6.jpeg
    │   └── ...
    └── ...

Pretrained Models

Based on different model architectures, we provide several pretrained models, as listed below.

model	Reso	acc@1	config
Pola-PVT-T	$224^2$	78.8 (+3.7)	config
Pola-PVT-S	$224^2$	81.9 (+2.1)	config
Pola-Swin-T	$224^2$	82.6 (+1.4)	config
Pola-Swin-S	$224^2$	83.6 (+0.6)	config
Pola-Swin-B	$224^2$	83.8 (+0.3)	config

Evaluate one model on ImageNet:

python -m torch.distributed.launch --nproc_per_node=8 main.py --cfg <path-to-config-file> --data-path <imagenet-path> --output <output-path> --eval --resume <path-to-pretrained-weights>

Train Models from Scratch

To train our model on ImageNet from scratch, see pretrain.sh and run:

bash pretrain.sh

Acknowledgements

This code is developed on the top of Swin Transformer and FLatten Transformer.

Citation

If you find this repo helpful, please consider citing us.

@inproceedings{
meng2025polaformer,
title={PolaFormer: Polarity-aware Linear Attention for Vision Transformers},
author={Weikang Meng and Yadan Luo and Xin Li and Dongmei Jiang and Zheng Zhang},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=kN6MFmKUSK}
}

Contact

If you have any questions, please feel free to contact the authors.

Weikang Meng: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
cfgs		cfgs
data		data
figures		figures
models		models
LICENSE		LICENSE
README.md		README.md
config.py		config.py
logger.py		logger.py
lr_scheduler.py		lr_scheduler.py
main.py		main.py
optimizer.py		optimizer.py
pretrain.sh		pretrain.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PolaFormer: Polarity-aware Linear Attention for Vision Transformers [ICLR 2025]

If you like our works, please support us with your stars⭐!

🚀 Welcome to the repo of PolaFormer!

This repo contains the official PyTorch code and pre-trained models for PolaFormer.

🔥 News

Introduction

Motivation

Method

Results

Dependencies

Data preparation

Pretrained Models

Train Models from Scratch

Acknowledgements

Citation

Contact

About

Releases

Packages

Languages

License

ZacharyMeng/PolaFormer

Folders and files

Latest commit

History

Repository files navigation

PolaFormer: Polarity-aware Linear Attention for Vision Transformers [ICLR 2025]

If you like our works, please support us with your stars⭐! 🚀 Welcome to the repo of PolaFormer! This repo contains the official PyTorch code and pre-trained models for PolaFormer.

🔥 News

Introduction

Motivation

Method

Results

Dependencies

Data preparation

Pretrained Models

Train Models from Scratch

Acknowledgements

Citation

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

If you like our works, please support us with your stars⭐!

🚀 Welcome to the repo of PolaFormer!

This repo contains the official PyTorch code and pre-trained models for PolaFormer.

Packages