-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MMTEB inconsistency #2026
Comments
Hi where is this table from? I was mostly referring to the datasets in https://huggingface.co/spaces/mteb/leaderboard => Task information tab. ![]() |
Oh I didn't know that you can choose different benchmarks on the top. Do you provide any aggregate scores for all these different MTEBs? (one single score for all datasets in all MTEBs). |
Maybe we should change the UI to make that more prominent |
Just leaving other discpreancies between the paper and the leaderboard: Table 1 and L427 say MTEB(eng) has 40 tasks, L259 saying 26 tasks, but it actually has 41 tasks on the leaderboard and Table 16. MTEB(multilingual) also has 132 tasks on the leaderboard, but 131 tasks on the paper (Table 1 and L242). |
This is from the "Performance per Task" section
Not currently, though we could create an over Overview leaderboard (probably not for all of them, but for a selection).
Hmm for the table 1 case you might have had an early version (at least in the latest version 26 have been corrected). Fixed 131 though (we added MiraclRetrievalHardNegatives as it was added after the iniitial paper submission) For the overview tables (e.g. 16) it seems like the current tables was recreated from the wrong branch. I have recreated them to match the latest update. (@imenelydiaker do you mind reviewing these. I believe you might have used an earlier version of v2) @jhyuklee if you want the updated version of the paper I will gladly forward you a copy |
That'd be great! I can review thew updated version if you want (email: [email protected]). I also wonder if you intended MMTEB to refer to only MTEB(multilingual) or the set of all MTEBs including MTEB (en), MTEB (code), and so on. Since the leaderboard shows only MTEB(multilingual) by default, people tend to believe that the MTEB(multilingual) is MMTEB. But after reading the paper I thought the latter was the case. |
Probably more the latter - making an overview page would be quite decent. Something like:
You could argue for chinese as well. |
On the overview page, would it be possible to have some organization/grouping of the benchmarks? For example, as one potential ordering, possibly lead with the most general MTEBs, followed by specific domains (Code, Law, Medical), then specific regions and then individual languages?
|
* Add SONAR metadata Add SONAR metadat, but without an implementation Fixes #1981 * fix: Add SONAR metadata Fixes #1981 * minor edits * reduced logging serverity of no model_meta.json * resolve missing models by ensuring that "Unknown" number of parameters is not filtered. Should resolve: #1979 #1976 This seems to have been caused by the restructering of calls on the leaderboard. * format * resolve missing models by ensuring that "Unknown" number of parameters is not filtered. Should resolve: #1979 #1976 This seems to have been caused by the `MAX_MODEL_SIZE/MIN_MODEL_SIZE` args. * format * format * added memory usage * fixed None checks * consistently refer to tasks as tasks not as datasets adresses #2026 * minor * removed used arg * revert fix of not allowing None in model name
Have added new issue on this and then I will close this one |
From @jhyuklee
The text was updated successfully, but these errors were encountered: