We present a novel modeling framework that recasts adapter tuning after attention as a graph message passing process on attention graphs, where the projected query and value features and attention matrix constitute the node features and the graph adjacency matrix, respectively. Within this framework, tuning adapters in VLMs necessitates handling heterophilic graphs, owing to the disparity between the projected query and value space.
To address this challenge, we propose a new adapter architecture,
This is the official Pytorch implementation of p-Adapter.
# Download pretrained models BLIP and bert_base_uncased in "./download_model/"
# Install python dependencies
pip install -r requirements.txt
# Download COCO datasets from the original websites in "./dataset/".
bash train_coco_caption.sh
# Download SNLI_VE dataset from the original websites in "./dataset/".
bash train_snli_ve.sh
# Download VQA v2 dataset from the original websites in "./dataset/".
bash train_vqa.sh
This repo is adapted from BLIP.
@article{wu2023p,
title={p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models},
author={Wu, Haoyuan and Zhang, Xinyun and Xu, Peng and Liao, Peiyu and Yao, Xufeng and Yu, Bei},
journal={arXiv preprint arXiv:2312.10613},
year={2023}
}