Prepare COCO data

Download COCO captions and preprocess them

Download preprocessed coco captions from link from Karpathy's homepage. Extract dataset_coco.json from the zip file and copy it in to data/. This file provides preprocessed captions and also standard train-val-test splits.

Then, download cocotalk_disc_text.zip and unzip it into data/.
unzip cocotalk_disc_text.zip -d data/

NOTE: Please make sure to use the files under cocotalk_disc_text.zip to keep the word-to-index conversion exactly the same as the one used in pre-traind models.

Image features: Bottom-up features (current standard)

Convert from peteanderson80's original file

Download pre-extracted features from link. You can either download adaptive one or fixed one.

For example:

mkdir data/bu_data; cd data/bu_data
wget https://imagecaption.blob.core.windows.net/imagecaption/trainval.zip
unzip trainval.zip

Then:

python scripts/make_bu_data.py --output_dir data/cocobu

This will create data/cocobu_fc, data/cocobu_att and data/cocobu_box. If you want to use bottom-up feature, you can just replace all "cocotalk" with "cocobu" in the training/test scripts.

Download converted files

bottomup-att: link

Acknowledgment

similar_set_id/ is provided by https://github.com/WangJiuniu/DistinctiveCap. Thanks to the authors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Prepare COCO data

Download COCO captions and preprocess them

Image features: Bottom-up features (current standard)

Convert from peteanderson80's original file

Download converted files

Acknowledgment

Files

README.md

Latest commit

History

README.md

File metadata and controls

Prepare COCO data

Download COCO captions and preprocess them

Image features: Bottom-up features (current standard)

Convert from peteanderson80's original file

Download converted files

Acknowledgment