Skip to content

[ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"

License

Notifications You must be signed in to change notification settings

BillChan226/HALC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

License: MIT Arxiv Hugging Face Transformers Project Page GitHub Stars

This repository provides the official PyTorch implementation of the following paper:

HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
Zhaorun Chen1,*, Zhuokai Zhao2,*, Lingyu Gao3, Hongyin Luo 4, Huaxiu Yao 1, Jiawei Zhou3

1UNC-Chapel Hill, 2University of Chicago, 3Toyota Technological Institute at Chicago, 4Massachusetts Institute of Technology
* Equal contribution

πŸ₯³ Features

Currently supported online OH decoding methods

Decoder Minigpt4-v2 Instructblip LLaVA-1.5 mPLUG-OWL2
Greedy* βœ… βœ… βœ… βœ…
HALC* βœ… βœ… βœ… βœ…
OPERA-Beam βœ… βœ… βœ… βœ…
VCD βœ… βœ… βœ… βœ…
DoLa* βœ… βœ… βœ… βœ…

*: indicates the method supports beam search.

Currently supported post-hoc methods

Post-hoc Minigpt4-v2 Instructblip LLaVA-1.5 mPLUG-OWL2
Woodpecker βœ… βœ… βœ… βœ…
LURE βœ… βœ… βœ… βœ…

πŸ› οΈ Installation

To install, run the following commands to install the required packages:

git clone https://github.com/BillChan226/HALC.git
cd HALC
conda env create -f environment.yml
conda activate halc

We employ Grounding DINO as the external detector to bound hallucinatory objects. To install GroundingDINO with CUDA, we simplify the installation process, where you can:

# set CUDA_HOME to the virtual environment halc
export CUDA_HOME=$CONDA_PREFIX
# install GroundingDINO
cd decoder_zoo/GroundingDINO
pip install -e .
# go back to HALC root
cd ../..

To download pre-trained model weights for DINO:

# default directory that contains the weights
mkdir model_checkpoints
cd model_checkpoints
# download weights
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
# go back to HALC root
cd ..

🐝 LVLM Backbones

The following evaluation requires for MSCOCO 2014 dataset. Please download here and extract it in your data path.

Besides, it needs you to prepare the following checkpoints of 7B base models:

Arguments

Argument Example Description
--model llava-1.5 Specify the MLLM model, this codebase supports instructblip, minigpt4, llava-1.5.
--data-path /path/to/dataset Path to the dataset file or folder, e.g., COCO_2014/val2014/.
--pope-type random Type for POPE evaluation, supports random, popular, adversarial.
--beam 3 Beam size for global search. Default: 1.

Arguments for HALC

Argument Example Description
--k-candidate-num 4 Number of generative focal fields for local search. Default: 4.
--expand-ratio 0.6 The growing factor of focal fields. Default: 0.6.
--detector dino Detector to use in [dino, owlv2]. Default: dino.

Arguments for OPERA

Argument Example Description
--scale_factor 50 The scale factor to scale up the self-attention weights. Default: 50.
--threshold 15 The threshold for attending retrospection. Default: 15.
--num_attn_candidates 5 The number of candidates per beam. Default: 5.
--penalty_weights 1 The weight of penalty term in decoding. Default: 1.

Arguments for VCD

Argument Example Description
--cd-alpha 1 Amplification factor. Default: 1.
--cd-beta 0.1 Truncation factor for adaptive plausibility constraint. Default: 0.1.
--noise-step 500 Number of steps to add diffusion noise. Default: 500.

βŒ› Benchmarks Evaluation

πŸͺ‘ CHAIR Evaluation of LVLMs Object Hallucination

Running LVLM to generate captions and result file format-ready for CHAIR

Following Evaluating Object Hallucination in Large Vision-Language Models, we used "Generate a short caption of the image" as the prompt to query LVLM for captions of the 2,000 images randomly sampled from COCO 2014 Val datast. Under root directory, run

python chair_eval.py --model [LVLM Backbone] --data-path [COCO_DIR] -d [Decoding Strategy] --num_samples 500 --seed [SEED] --gpu-id [GPU_IDs] --output_dir ./generated_captions/

For a full list of command line input, run python generate_chair_input.py -h. Note that [COCO_DIR] is expected to contain both images and annotation files within the annotations subfolder. In other words, [COCO_DIR] should the the following structure:

COCO_DIR (val2014 for example)
  - annotations
    - captions_val2014.json
    - captions_val2014.json
    - instances_train2014.json
    - instances_val2014.json
    - person_keypoints_train2014.json
    - person_keypoints_val2014.json
  - COCO_val2014_000000000042.jpg
  - COCO_val2014_000000000073.jpg
  ...

Upon completion, two files, minigpt4_pretrain-llama2_coco_2000_generated_captions.json and minigpt4_pretrain-llama2_coco_2000_chair.json should be generated under generated_captions/minigpt4_pretrain-llama2/coco/ if llama2 is the model_type used for minigpt4.

πŸ€΅β€β™‚οΈ POPE Evaluation of LVLMs Object Hallucination

Running LVLM to generate captions and result file format-ready for POPE

Under root directory, run

python pope_eval.py --model [LVLM Backbone] --data_path [COCO_DIR] -d [Decoding Strategy] --pope_type [random/popular/adversarial] --num_images 100 --seed [SEED] --gpu_id [GPU_IDs] --output_dir ./generated_captions/

πŸ₯ Running Post-hoc methods to revise captions for CHAIR

Under root directory, run

python reviser_eval.py -r [woodpecker/lure] --data_path [COCO_DIR] --c [PATH_TO_CAPTION] --seed [SEED] --gpu-id [GPU_IDs] --output_dir ./log/

Evaluation

CHAIR Evaluation

We use the generated _chair.json file, for example, minigpt4_pretrain-llama2_coco_2000_chair.json for the CHAIR evaluation. Under root directory, run

python eval_hallucination.py --metric chair --chair_input_path [PATH_TO_.JSON_FILE] -v

POPE Evaluation

python eval_hallucination.py --metric pope --pope_answer_path [PATH_TO_MODEL_OUTPUT] --pope_question_path [PATH_TO_.POPE_QUESTION] -v

The evaluation results will be printed in terminal.

🎒 Demo Playgrounds

πŸ¦… HALC Demo

Run CDL demo on a toy example:

python context_density/context_decoding.py --cfg-path eval_configs/minigpt4_llama2_eval.yaml  --gpu-id 0

ViT Early Exit Layers Demo

Specify early_exit_layer_idx then run ViT early exit layers contrastive decoding:

python vit_early_exit_contrast.py --cfg-path eval_configs/minigpt4_llama2_eval.yaml  --gpu-id 0

DoLA Demo

Test DoLa with their textual input

run

python toy_dola_eval.py --model-name ./models/models--meta-llama--Llama-2-7b-chat-hf/snapshots/94b07a6e30c3292b8265ed32ffdeccfdadf434a8 --output-path output-path.json --num-gpus 1 --early-exit-layers 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32

Note: adding 32 in the early-exit-layers is crucial for reasonable output.

JSD for each candidate layer is printed and input at line 2720 of file DoLa/transformers-4.28.1/src/transformers/generation/utils.py

Test DoLA with visual-textual input

run a toy example:

python contrast_decoding.py --cfg-path eval_configs/minigpt4_llama2_eval.yaml  --gpu-id 0

The toy example is projected into the prefix of the language model as a context.

πŸ”§ Troubleshooting

Error installing GroundingDINO

If error NameError: name '_C' is not defined is reported, refer to this issue for a quick fix.

Error installing pattern

conda install -c conda-forge pattern

CUDA Error installing GroundingDINO

conda install pytorch torchvision torchaudio pytorch-cuda=[YOUR NVIDIA CUDA VERSION] -c pytorch -c nvidia

runtimeError: Input type (float) and bias type (c10::Half) should be the same

simply reinstall torch==2.0.0 will most likely solve the issue

pip uninstall torch
pip install torch==2.0.0

πŸ”‘ License

This repository is under BSD 3-Clause License. Many codes are based on Lavis with BSD 3-Clause License here.

About

[ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •