Updated on 2023.06.08
This is a repository for the ICLR2023 accepted paper -- Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study.
- We propose a paradigm of prompt desgining that including expressive attributes into the prompts. We show that such prompts can help pre-trained Visual-Lanugage Models(VLM) rapidly adapt to unseen medical domain datasets.
- We further propose three different approaches for automatical prompts generation, leveraging either specialized Language Models(LM) or VQA models to obtain.
- Our methods are evaluated by various public medical datasets. For more detail please refer to the paper.
ZeroShot Results
Our methods show superiority under zero-shot settings.Due to the license factor, we can not share all the datasets we used in our work, but we upload the polyp benchmark datasets as sample. The polyp datasets are prepared by PraNet project, you can also download the data here. If someone wants to use their own dataset, please refer to the polyp datasets to organize their data paths and annotation files.
Netdisk Type | Link | Password(optional) |
---|---|---|
BaiduNetDisk | link | s2nf |
Google Drive | link | N/A |
After you download this zip file, please unzip it and place the folder at the project path.
We also provide a interface space on huggingface for quick interaction with our approach. Please check this link for the interactable demo page.
You can also check this Colab script for code and training detail.
Main Requirements
Our project is based on the GLIP project, so please first setup the environment for the GLIP model following this instruction. Please make sure you download the GLIP-T Model weight here and put it under the MODEL/ path.
Next, please clone this repository and continue the installation guide in the next section.
Installation
git clone https://github.com/MembrAI/MIU-VL.git
pip install -r requirements.txt
We follow the config file format used in the GLIP project. Please refer to the sample config file we provided to create your own config file. Note: The DATASETS.CAPTION_PROMPT content is ignore by our code, as our code use the automatically generated code instead of user inputted prompt.
Generate prompts with Masked Language Model(MLM) method In our work, we proposed three different methods to automatically generate prompts with expressive attributes. The first approach is the MLM method. To generate prompts with this approach, we need to use the pre-trained Language Models as our knowledge source. In this project, we use the BiomedNLP-PubmedBERT-base model as our specialized language model. Please use the following code to download the model to this repo:
git lfs install
git clone https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext
After you setup the depedencies we need for automatically generate prompts with MLM method, you can now generate prompts for your dataset. However, our code currently only support generate prompts with three expressive attributes -- color, shape, and location. We may improve our code in the future to support more kinds of attributes, but we found the three attributes we included are the most useful attributes for now.
Now, run the following codes to get the prompts generated with our MLM method.
bash RUN/autoprompt/make_auto_mlm.sh
or
python make_autopromptsv2.py --dataset 'kvasir' \
--cls_names 'polyp' \
--vqa_names 'wound'\
--mode 'lama'\
--real_cls_names 'bump'
where --dataset
is the dataset name that will be used for searching related data paths. --cls_names
indicates the class name in the template that used to extract the attributes information from the language model. For example, in this case, we will ask the LM to predict the masked word in the following template:"The typical color of polyp is [MASK] color ". Then the LM will predict the MASK token considering the given class name. --vqa_names
is similar to the cls_names
above, execept it is used for asking the VQA models later. --mode
argument decide which approach of automated generation will be used, and 'lama' refers to the MLM method. Finally, real_cls_names
is the real class name that you will put into the prompt. Sometimes, we found substitude the terminologies with general vocabularies may imrpove the performance. For example, we use bump, instead of polyp, in our final prompts, and we observe a significant improvement.
After running the command before, you will receive several .json files saved in the 'autoprompt_json/' folder. These json files stored all the generated prompts for each image input. To run the final inferece code, please type the following codes:
#!/bin/bash
config_file=path/to/config/file.yaml
odinw_configs=path/to/config/file.yaml
output_dir=output/path
model_checkpoint=MODEL/glip_tiny_model_o365_goldg.pth
jsonFile=autoprompt_json/lama_kvasir_path_prompt_top1.json
python test_vqa.py --json ${jsonFile} \
--config-file ${config_file} --weight ${model_checkpoint} \
--task_config ${odinw_configs} \
OUTPUT_DIR ${output_dir}\
TEST.IMS_PER_BATCH 2 SOLVER.IMS_PER_BATCH 2 \
TEST.EVAL_TASK detection \
DATASETS.TRAIN_DATASETNAME_SUFFIX _grounding \
DATALOADER.DISTRIBUTE_CHUNK_AMONG_NODE False \
DATASETS.USE_OVERRIDE_CATEGORY True \
DATASETS.USE_CAPTION_PROMPT True\
Generate image-specific prompts with VQA and Hybrid method Our approach need to use the OFA model for Visual-question answering tasks, and thus you need to follow this guide to intall the OFA module with huggingface transformers. Note: We use the OFA-base model in this project. For you convenience, you can simply run the following code to install the OFA model with huggingface transformers. But we recommend you to refer to the user guide in case there is any problem.
git clone --single-branch --branch feature/add_transformers https://github.com/OFA-Sys/OFA.git
pip install OFA/transformers/
git clone https://huggingface.co/OFA-Sys/OFA-base
Here, we will show how to generate auto-prompts json files with Hybrid methods.
python make_autopromptsv2.py --dataset 'kvasir' \
--cls_names 'polyp' \
--vqa_names 'bump'\
--mode 'hybrid'\
--real_cls_names 'bump'
or run the pre-defined bash file
bash RUN/autoprompt/make_auto_hybrid.sh
As we mentioned above, --mode
argument decide which approach for prompts generation. 'hybird' and 'vqa' will activate the hybrid or vqa method respectively
Note: runing the hybrid or vqa method will take hours to obtain the prompt file. We recommend to use a GPU with at least 24GB Memory to run this script
Again, you will obtain several json files which have the autoprompts generated by our approach wrt each image input. If you can not run the script above due to the GPU limitation, we also provided some sample files under the autoprompt_json path. You can use these files as references and run the following code to do the inference with generated prompts:
config_file=path/to/config/file.yaml
odinw_configs=path/to/config/file.yaml
output_dir=output/path
model_checkpoint=MODEL/glip_tiny_model_o365_goldg.pth
jsonFile=autoprompt_json/hybrid_kvasir_path_prompt_top1.json
python test_vqa.py --json ${jsonFile} \
--config-file ${config_file} --weight ${model_checkpoint} \
--task_config ${odinw_configs} \
OUTPUT_DIR ${output_dir}\
TEST.IMS_PER_BATCH 2 SOLVER.IMS_PER_BATCH 2 \
TEST.EVAL_TASK detection \
DATASETS.TRAIN_DATASETNAME_SUFFIX _grounding \
DATALOADER.DISTRIBUTE_CHUNK_AMONG_NODE False \
DATASETS.USE_OVERRIDE_CATEGORY True \
DATASETS.USE_CAPTION_PROMPT True\
We also finetuned the GLIP model with the medical data we collected. For convenience, we uploaded all the check pointed here for people who want to replicate our results.
Non-radiology Checkpoints
DataSet | Weights | PASSWORD |
---|---|---|
Polyp | Link | 1f8e |
CPM17 | Link | ywyc |
BCCD | Link | 4wrb |
ISIC2016 | Link | j7fc |
DFUC2020 | Link | pbir |
Radiology Dataset and Checkpoints
DataSet | Weights | PASSWORD |
---|---|---|
LUNA16 | Link | tg5h |
ADNI | Link | dptg |
TN3k | Link | 596i |
TBX11k | Link | tv9s |
Fine-Tune Results
- Emai: [email protected]
This project is under the CC-BY-NC 4.0 license. See LICENSE for details.
Our code is adapted from the GLIP project. And we also use the OFA and PubMedBert for auto-prompt generation. Thanks for their execellent works.
If you find this repository useful, please consider citing this paper:
@article{Qin2022MedicalIU,
title={Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study},
author={Ziyuan Qin and Huahui Yi and Qicheng Lao and Kang Li},
journal={ArXiv},
year={2022},
volume={abs/2209.15517}
}