Skip to content

ukyh/switch_disc_caption

Repository files navigation

Switch to Discriminative Image Captioning

The code for our paper, Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning (WACV 2023). Our methods implemented here provide a switch to discriminative image captioning: given off-the-shelf captioning models trained with reinforcement learning, our methods enable them to describe characteristic details of input images with only a lightweight fine-tuning.

Acknowledgment

The code is based on self-critical.pytorch. We thank the authors of the repository, the original neuraltalk2, and awesome PyTorch team.

Setup

git clone https://github.com/ukyh/switch_disc_caption.git
cd switch_disc_caption
git submodule update --init --recursive

conda create --name switch_disc_cap python=3.6
conda activate switch_disc_cap

pip install -r requirements.txt

Downloads

  1. Follow the instruction in data/README.md to download and preprcess data.
  2. Follow the instruction in coco-caption/README.md to download evaluation tools.
  3. Download pre-trained models from MODEL_ZOO.md. We used Att2in+self_critical (att2in_scst), UpDown+self_critical (updown_scst), and Transformer+self_critical (trans_scst) for the experiments of our paper. To run expt_scripts, downloaded models have to be placed as follows:
./saved_models/
  ├── att2in_scst/
  │     ├── model-best.pth
  │     └── infos_a2i2_sc-best.pkl
  ├── updown_scst/
  │     ├── model-best.pth
  │     └── infos_tds_sc-best.pkl
  └── trans_scst/
        ├── model-best.pth
        └── infos_trans_scl-best.pkl
  1. (Optional: not necessary if you just want to try our fine-tuning)
    If you want to train RL models in this repo, get the cache for calculating cider score:
python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Run

Fine-Tuning

Run sh expt_scripts/[SELECT_SCRIPT].sh.
It returns a fine-tuned model under saved_models and .json output files (MS COCO Karpathy val/test split) under eval_results.

We have released the fine-tuned models and output files here.

Evaluation

Evaluation uses the output files under eval_results. Use the following repositories/scripts for evaluation in each metric.
NOTE: DO NOT use the files start with tmpeval_ as the decoding methods of those outputs (beam size and BP decoding) are not specified correctly.

Reference

If you find this repo useful, please consider citing (no obligation at all):

@inproceedings{honda2023switch,
  title={Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning},
  author={Honda, Ukyo and Taro, Watanabe and Yuji, Matsumoto},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year={2023}
}

@article{luo2018discriminability,
  title={Discriminability objective for training descriptive captions},
  author={Luo, Ruotian and Price, Brian and Cohen, Scott and Shakhnarovich, Gregory},
  journal={arXiv preprint arXiv:1803.04376},
  year={2018}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published