Metrics #22

zwhus · 2025-03-17T07:16:55Z

I noticed that the result of e5-v on COCO retrieval in the text is 52 / 62, but in Appendix C, the remaining MLLM-based results are mentioned, as shown in the figure below. I understand that the COCO result is abnormal.

On one hand, why is there such a significant difference with LLaVA-NeXT-8B? On the other hand, these two results don't seem to align with the top-1 metrics for COCO I2T or T2I. Could you please clarify the source of the metrics in this table or provide the relevant metrics for Phi-3V?

kongds · 2025-03-17T09:14:56Z

The results in this picture are the settings without fine-tuning, which means we only use prompts to get the results. The results of E5-V are from fine-tuning on text pairs, which is reported in the red box in the following table. The results for COCO are 76.5/83.6, which match the T2I/I2T R@5 metrics in Table 1

zwhus · 2025-03-17T12:37:01Z

Thanks, but could you provide the relevant metrics(T2I/I2T R@1) on COCO and Flickr30k for Phi-3V on with-finetuning setting?

kongds · 2025-03-17T12:58:50Z

Sorry, but I only keep the results of R@5 for Phi-3V with the fine-tuning setting.

zwhus · 2025-03-17T13:17:35Z

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics #22

Metrics #22

zwhus commented Mar 17, 2025

kongds commented Mar 17, 2025

zwhus commented Mar 17, 2025

kongds commented Mar 17, 2025

zwhus commented Mar 17, 2025

Metrics #22

Metrics #22

Comments

zwhus commented Mar 17, 2025

kongds commented Mar 17, 2025

zwhus commented Mar 17, 2025

kongds commented Mar 17, 2025

zwhus commented Mar 17, 2025