This repo contains the Food-500 Cap dataset for our paper: Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language(ACM MM'2023)
We provide the descriptions in two files (finetune_data.json, and evaluation_data.json). The images of the Food-500 Cap are from ISIA Food-500, you can download images from here.
Note:
In our paper we use all data to evaluate the VLMs. For the convenience of everyone's use and comparison, we have divided the dataset into train (finetune_data.json, 19760 pairs) and test (evaluation_data.json, 4940 pairs).
@article{ma2023food,
title={Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models},
author={Ma, Zheng and Pan, Mianzhi and Wu, Wenhan and Cheng, Kanzhi and Zhang, Jianbing and Huang, Shujian and Chen, Jiajun},
journal={arXiv preprint arXiv:2308.03151},
year={2023}
}