Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics #22

Open
zwhus opened this issue Mar 17, 2025 · 4 comments
Open

Metrics #22

zwhus opened this issue Mar 17, 2025 · 4 comments

Comments

@zwhus
Copy link

zwhus commented Mar 17, 2025

I noticed that the result of e5-v on COCO retrieval in the text is 52 / 62, but in Appendix C, the remaining MLLM-based results are mentioned, as shown in the figure below. I understand that the COCO result is abnormal.

Image
On one hand, why is there such a significant difference with LLaVA-NeXT-8B? On the other hand, these two results don't seem to align with the top-1 metrics for COCO I2T or T2I. Could you please clarify the source of the metrics in this table or provide the relevant metrics for Phi-3V?

@kongds
Copy link
Owner

kongds commented Mar 17, 2025

The results in this picture are the settings without fine-tuning, which means we only use prompts to get the results. The results of E5-V are from fine-tuning on text pairs, which is reported in the red box in the following table. The results for COCO are 76.5/83.6, which match the T2I/I2T R@5 metrics in Table 1

Image

@zwhus
Copy link
Author

zwhus commented Mar 17, 2025

Thanks, but could you provide the relevant metrics(T2I/I2T R@1) on COCO and Flickr30k for Phi-3V on with-finetuning setting?

@kongds
Copy link
Owner

kongds commented Mar 17, 2025

Sorry, but I only keep the results of R@5 for Phi-3V with the fine-tuning setting.

@zwhus
Copy link
Author

zwhus commented Mar 17, 2025

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants