Welcome! This is the official implementation of our paper , which has been accepted by ICLR 2025.
In this work, we propose a solution that tackles spurious correlations in VLMs within the zero-shot setting. Our approach utilizes a translation operation that preserves the latent space distribution to address issues of spurious correlations. Our method is inspired by a theoretical analysis, which identifies that the optimal translation directions are along the spurious vector. As VLMs unify two modalities, we compute spurious vectors from the text prompts and guide the translation for image embeddings, aligning the requirements for the fusion of different modalities in VLMs.
- open_clip
- requirements (in requirements.txt)
- Install required packages:
pip install -r requirements.txt
- Install open_clip_torch:
pip install open_clip_torch
-
CelebA Dataset
- Download from: https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
- Place in the directory above the repository root
-
Waterbirds Dataset
- Download from: https://github.com/p-lambda/wilds
- Place in the directory above the repository root
-
ISIC Dataset
- Run the provided code snippet to download and extract
import os
import gdown
import zipfile
data_root = '..' # Set your ROOT directory
os.makedirs(data_root, exist_ok=True)
output = 'isic.zip'
url = 'https://drive.google.com/uc?id=1Os34EapIAJM34DrwZMw2rRRJij3HAUDV'
if not os.path.exists(os.path.join(data_root, 'isic')):
gdown.download(url, os.path.join(data_root, output), quiet=False)
with zipfile.ZipFile(os.path.join(data_root, output), 'r') as zip_ref:
zip_ref.extractall(data_root)
-
COVID-19 Dataset
- Download from: https://github.com/ieee8023/covid-chestxray-dataset
-
FMOW Dataset
- FMOW can be downloaded by Wilds
To run TIE, we provide a Jupyter Notebook (.ipynb
) file that contains all historical data for reference. Follow these steps:
Ensure you have an active Conda environment before proceeding. To reproduce TIE results for different datasets, execute the corresponding Jupyter Notebook:
- Waterbirds: Run
Table1-WB.ipynb
- CelebA: Run
Table2-CelebA.ipynb
- ISIC: Run
Table3-ISIC.ipynb
- COVID-19: Run
Table3-Covid.ipynb
To enable TIE*, modify the following line in the code:
a = True
Change it to
a = False
This will automatically prevent the use of the spurious label, leverage zero-shot capability to infer the spurious label, and change the model to TIE*.
If you want to use a different CLIP model, update the following lines in the code:
Current (Default) CLIP Model – ViT-L/14
model,_, preprocess = open_clip.create_model_and_transforms("ViT-L-14", pretrained='laion2b_s32b_b82k') #ViTL/14
model = model.to(device)
tokenizer = open_clip.get_tokenizer('ViT-L-14')
Switching to CLIP ViT-B/32
To use the ViT-B/32 model instead, comment out the above lines and uncomment the following:
model,_, preprocess = open_clip.create_model_and_transforms("ViT-B/32", pretrained='openai') #ViTB/32
model = model.to(device)
tokenizer = open_clip.get_tokenizer('ViT-B-32')
Both TIE and TIE* support this modification.
We sincerely thank the contributors of open-source repositories that have supported this project, especially:
If you found this work interesting or useful, would you please consider citing our work:
@inproceedings{
lu2025mitigating,
title={Mitigating Spurious Correlations in Zero-Shot Multimodal Models},
author={Shenyu Lu and Junyi Chai and Xiaoqian Wang},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=UsRKFYR4lM}
}