Official code for our paper: Interpreting and Analyzing CLIP's Zero-Shot Image Classification via Mutual Knowledge, NeurIPS 2024
In order to keep this repo clean and readable, only the code for a single example is provided. If you need codes for other analysis/ablations/experiments/datasets, please submit an issue or email me, and i will provide it directly.
To avoid running the CLIP text encoder on all classes and descriptors everytime, we save its output as the weights of a nn.Linear layer for ease and convenient loading. Please download the imagenet classifiers from here and place in the directory of this repo.