Name	Name	Last commit message	Last commit date
Latest commit History 108 Commits
chair_metrics	chair_metrics
dataset	dataset
decoder_zoo	decoder_zoo
eval	eval
eval_configs	eval_configs
final_result/chair	final_result/chair
hallucination_results/chair	hallucination_results/chair
hallucinatory_image	hallucinatory_image
log	log
mPLUG-Owl	mPLUG-Owl
minigpt4	minigpt4
models	models
pope_coco	pope_coco
pope_metrics	pope_metrics
train_configs	train_configs
transformers-4.28.1	transformers-4.28.1
transformers-4.36.2	transformers-4.36.2
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
chair.py	chair.py
chair_eval.py	chair_eval.py
chair_eval_replete.py	chair_eval_replete.py
copy_lost_files.py	copy_lost_files.py
demo_inference.py	demo_inference.py
environment.yml	environment.yml
eval_hallucination.py	eval_hallucination.py
file_calculation.py	file_calculation.py
generate_pope_input.py	generate_pope_input.py
get_model.py	get_model.py
pope_eval.py	pope_eval.py
pope_loader.py	pope_loader.py
pope_modified_eval.py	pope_modified_eval.py
reviser_eval.py	reviser_eval.py

HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

This repository provides the official PyTorch implementation of the following paper:

HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
Zhaorun Chen^1,*, Zhuokai Zhao^2,*, Lingyu Gao³, Hongyin Luo ⁴, Huaxiu Yao ¹, Jiawei Zhou³

¹UNC-Chapel Hill, ²University of Chicago, ³Toyota Technological Institute at Chicago, ⁴Massachusetts Institute of Technology
_{* Equal contribution}

🥳 Features

Currently supported online OH decoding methods

Decoder	Minigpt4-v2	Instructblip	LLaVA-1.5	mPLUG-OWL2
Greedy*	✅	✅	✅	✅
HALC*	✅	✅	✅	✅
OPERA-Beam	✅	✅	✅	✅
VCD	✅	✅	✅	✅
DoLa*	✅	✅	✅	✅

*: indicates the method supports beam search.

Currently supported post-hoc methods

Post-hoc	Minigpt4-v2	Instructblip	LLaVA-1.5	mPLUG-OWL2
Woodpecker	✅	✅	✅	✅
LURE	✅	✅	✅	✅

🛠️ Installation

To install, run the following commands to install the required packages:

git clone https://github.com/BillChan226/HALC.git
cd HALC
conda env create -f environment.yml
conda activate halc

We employ Grounding DINO as the external detector to bound hallucinatory objects. To install GroundingDINO with CUDA, we simplify the installation process, where you can:

# set CUDA_HOME to the virtual environment halc
export CUDA_HOME=$CONDA_PREFIX
# install GroundingDINO
cd decoder_zoo/GroundingDINO
pip install -e .
# go back to HALC root
cd ../..

To download pre-trained model weights for DINO:

# default directory that contains the weights
mkdir model_checkpoints
cd model_checkpoints
# download weights
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
# go back to HALC root
cd ..

🐝 LVLM Backbones

The following evaluation requires for MSCOCO 2014 dataset. Please download here and extract it in your data path.

Besides, it needs you to prepare the following checkpoints of 7B base models:

Download LLaVA-1.5 merged 7B model and specify it at Line 14 of eval_configs/llava-1.5_eval.yaml.
Download LLaMA-2 7B model and specify it at Line 15 of minigpt4/configs/models/minigpt4_llama2.yaml.
Download Vicuna 7B v1.1 model and specify it at Line 25 of minigpt4/configs/models/blip2_instruct_vicuna7b.yaml.
Download Vicuna 7B v0 model and specify it at Line 18 of minigpt4/configs/models/minigpt4_vicuna0.yaml.
Download MiniGPT-4 7B pretrained weights and specify it at Line 8 of eval_configs/minigpt4_eval.yaml.
Download MiniGPT-4 7B pretrained weights for LlaMA-2 and specify it at Line 8 of eval_configs/minigpt4_llama2_eval.yaml.
Download mPLUG-Owl2 7B pretrained weights and specify it at Line 14 of eval_configs/mplug-owl2_eval.yaml.

Arguments

Argument	Example	Description
`--model`	`llava-1.5`	Specify the MLLM model, this codebase supports `instructblip`, `minigpt4`, `llava-1.5`.
`--data-path`	`/path/to/dataset`	Path to the dataset file or folder, e.g., `COCO_2014/val2014/`.
`--pope-type`	`random`	Type for POPE evaluation, supports `random`, `popular`, `adversarial`.
`--beam`	`3`	Beam size for global search. Default: 1.

Arguments for HALC

Argument	Example	Description
`--k-candidate-num`	`4`	Number of generative focal fields for local search. Default: 4.
`--expand-ratio`	`0.6`	The growing factor of focal fields. Default: 0.6.
`--detector`	`dino`	Detector to use in [dino, owlv2]. Default: dino.

Arguments for OPERA

Argument	Example	Description
`--scale_factor`	`50`	The scale factor to scale up the self-attention weights. Default: 50.
`--threshold`	`15`	The threshold for attending retrospection. Default: 15.
`--num_attn_candidates`	`5`	The number of candidates per beam. Default: 5.
`--penalty_weights`	`1`	The weight of penalty term in decoding. Default: 1.

Arguments for VCD

Argument	Example	Description
`--cd-alpha`	`1`	Amplification factor. Default: 1.
`--cd-beta`	`0.1`	Truncation factor for adaptive plausibility constraint. Default: 0.1.
`--noise-step`	`500`	Number of steps to add diffusion noise. Default: 500.

⌛ Benchmarks Evaluation

🪑 CHAIR Evaluation of LVLMs Object Hallucination

Running LVLM to generate captions and result file format-ready for CHAIR

Following Evaluating Object Hallucination in Large Vision-Language Models, we used "Generate a short caption of the image" as the prompt to query LVLM for captions of the 2,000 images randomly sampled from COCO 2014 Val datast. Under root directory, run

python chair_eval.py --model [LVLM Backbone] --data-path [COCO_DIR] -d [Decoding Strategy] --num_samples 500 --seed [SEED] --gpu-id [GPU_IDs] --output_dir ./generated_captions/

For a full list of command line input, run python generate_chair_input.py -h. Note that [COCO_DIR] is expected to contain both images and annotation files within the annotations subfolder. In other words, [COCO_DIR] should the the following structure:

COCO_DIR (val2014 for example)
  - annotations
    - captions_val2014.json
    - captions_val2014.json
    - instances_train2014.json
    - instances_val2014.json
    - person_keypoints_train2014.json
    - person_keypoints_val2014.json
  - COCO_val2014_000000000042.jpg
  - COCO_val2014_000000000073.jpg
  ...

Upon completion, two files, minigpt4_pretrain-llama2_coco_2000_generated_captions.json and minigpt4_pretrain-llama2_coco_2000_chair.json should be generated under generated_captions/minigpt4_pretrain-llama2/coco/ if llama2 is the model_type used for minigpt4.

🤵‍♂️ POPE Evaluation of LVLMs Object Hallucination

Running LVLM to generate captions and result file format-ready for POPE

Under root directory, run

python pope_eval.py --model [LVLM Backbone] --data_path [COCO_DIR] -d [Decoding Strategy] --pope_type [random/popular/adversarial] --num_images 100 --seed [SEED] --gpu_id [GPU_IDs] --output_dir ./generated_captions/

🏥 Running Post-hoc methods to revise captions for CHAIR

Under root directory, run

python reviser_eval.py -r [woodpecker/lure] --data_path [COCO_DIR] --c [PATH_TO_CAPTION] --seed [SEED] --gpu-id [GPU_IDs] --output_dir ./log/

Evaluation

CHAIR Evaluation

We use the generated _chair.json file, for example, minigpt4_pretrain-llama2_coco_2000_chair.json for the CHAIR evaluation. Under root directory, run

python eval_hallucination.py --metric chair --chair_input_path [PATH_TO_.JSON_FILE] -v

POPE Evaluation

python eval_hallucination.py --metric pope --pope_answer_path [PATH_TO_MODEL_OUTPUT] --pope_question_path [PATH_TO_.POPE_QUESTION] -v

The evaluation results will be printed in terminal.

🎢 Demo Playgrounds

🦅 HALC Demo

Run CDL demo on a toy example:

python context_density/context_decoding.py --cfg-path eval_configs/minigpt4_llama2_eval.yaml  --gpu-id 0

ViT Early Exit Layers Demo

Specify early_exit_layer_idx then run ViT early exit layers contrastive decoding:

python vit_early_exit_contrast.py --cfg-path eval_configs/minigpt4_llama2_eval.yaml  --gpu-id 0

DoLA Demo

Test DoLa with their textual input

run

python toy_dola_eval.py --model-name ./models/models--meta-llama--Llama-2-7b-chat-hf/snapshots/94b07a6e30c3292b8265ed32ffdeccfdadf434a8 --output-path output-path.json --num-gpus 1 --early-exit-layers 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32

Note: adding 32 in the early-exit-layers is crucial for reasonable output.

JSD for each candidate layer is printed and input at line 2720 of file DoLa/transformers-4.28.1/src/transformers/generation/utils.py

Test DoLA with visual-textual input

run a toy example:

python contrast_decoding.py --cfg-path eval_configs/minigpt4_llama2_eval.yaml  --gpu-id 0

The toy example is projected into the prefix of the language model as a context.

🔧 Troubleshooting

Error installing `GroundingDINO`

If error NameError: name '_C' is not defined is reported, refer to this issue for a quick fix.

Error installing `pattern`

conda install -c conda-forge pattern

CUDA Error installing `GroundingDINO`

conda install pytorch torchvision torchaudio pytorch-cuda=[YOUR NVIDIA CUDA VERSION] -c pytorch -c nvidia

runtimeError: Input type (float) and bias type (c10::Half) should be the same

simply reinstall torch==2.0.0 will most likely solve the issue

pip uninstall torch
pip install torch==2.0.0

🔑 License

This repository is under BSD 3-Clause License. Many codes are based on Lavis with BSD 3-Clause License here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

🥳 Features

Currently supported online OH decoding methods

Currently supported post-hoc methods

🛠️ Installation

🐝 LVLM Backbones

Arguments

Arguments for HALC

Arguments for OPERA

Arguments for VCD

⌛ Benchmarks Evaluation

🪑 CHAIR Evaluation of LVLMs Object Hallucination

Running LVLM to generate captions and result file format-ready for CHAIR

🤵‍♂️ POPE Evaluation of LVLMs Object Hallucination

Running LVLM to generate captions and result file format-ready for POPE

🏥 Running Post-hoc methods to revise captions for CHAIR

Evaluation

CHAIR Evaluation

POPE Evaluation

🎢 Demo Playgrounds

🦅 HALC Demo

ViT Early Exit Layers Demo

DoLA Demo

Test DoLa with their textual input

Test DoLA with visual-textual input

🔧 Troubleshooting

Error installing `GroundingDINO`

Error installing `pattern`

CUDA Error installing `GroundingDINO`

runtimeError: Input type (float) and bias type (c10::Half) should be the same

🔑 License

About

Packages

Contributors 3

Languages

License

BillChan226/HALC

Folders and files

Latest commit

History

Repository files navigation

HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

🥳 Features

Currently supported online OH decoding methods

Currently supported post-hoc methods

🛠️ Installation

🐝 LVLM Backbones

Arguments

Arguments for HALC

Arguments for OPERA

Arguments for VCD

⌛ Benchmarks Evaluation

🪑 CHAIR Evaluation of LVLMs Object Hallucination

Running LVLM to generate captions and result file format-ready for CHAIR

🤵‍♂️ POPE Evaluation of LVLMs Object Hallucination

Running LVLM to generate captions and result file format-ready for POPE

🏥 Running Post-hoc methods to revise captions for CHAIR

Evaluation

CHAIR Evaluation

POPE Evaluation

🎢 Demo Playgrounds

🦅 HALC Demo

ViT Early Exit Layers Demo

DoLA Demo

Test DoLa with their textual input

Test DoLA with visual-textual input

🔧 Troubleshooting

Error installing GroundingDINO

Error installing pattern

CUDA Error installing GroundingDINO

runtimeError: Input type (float) and bias type (c10::Half) should be the same

🔑 License

About

Topics

Resources

License

Stars

Watchers

Forks

Packages 0

Contributors 3

Languages

Error installing `GroundingDINO`

Error installing `pattern`

CUDA Error installing `GroundingDINO`

Packages