You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am observing an inconsistency of the NDCG even when I have a evaluation set that only include one query.
I pass a single query index with 20 rows of data in as an evaluation set into the fit functionality (below X_ex_small and y_ex_small) which I will call the "ex_small" sample. The NDCG@20 matches for the "ex_small" sample when I use XGBoost's fit+evals_results and score functions. However, I am not able to replicate the computed NDCG for the "ex_small" sample manually.
For the "ex_sample" sample, one row of the 20 has a relevance of 4, two rows have a relevance of 1, and the rest have a relevance of 0. If I visualize the rows with the code lines below I see that the row with a relevance of 4 is ranked 18th of 20 in terms of relevance and the the two rows of relevance 1 are ranked 5th and 14th out of 20.
# Visualizing the predicted relevance compared to actual relevancey_ex_small_pred=ranker.predict(X_ex_small)
temp=y_ex_small.copy()
df_temp=temp.to_frame()
df_temp["pred"] =y_ex_small_preddf_temp.sort_values(by="pred", ascending=False)
If I try to hand compute the relevance score at k=20 I do not get the same NDCG@20 as the XGBoost functions.
# Manually computing the =ndcg@20 for the small ex athlete -- results do not match xgboost score or fit outputsfrommathimportlog2dcg20= ( 1/log2(1+5)) + ( 1/log2(1+14)) + ( 4/log2(1+18)) # predicted rankingidcg20= ( 4/log2(1+1)) + ( 1/log2(1+2)) + ( 1/log2(1+3)) # optimal rankingdcg20/idcg20# returns: 0.3088029970412347
I read in the documentation that there might be issues because not all functions take into account the qid, however in the "ex_small" sample there is only one query id so I expected to be able to replicate the NDCG by hand. Can you help me understand why this is occurring?
The text was updated successfully, but these errors were encountered:
cjsombric
changed the title
Inconsistency with NDCG for ranker
Inconsistency with NDCG for XGBRanker
Feb 10, 2025
I am observing an inconsistency of the NDCG even when I have a evaluation set that only include one query.
I pass a single query index with 20 rows of data in as an evaluation set into the fit functionality (below X_ex_small and y_ex_small) which I will call the "ex_small" sample. The NDCG@20 matches for the "ex_small" sample when I use XGBoost's fit+evals_results and score functions. However, I am not able to replicate the computed NDCG for the "ex_small" sample manually.
For the "ex_sample" sample, one row of the 20 has a relevance of 4, two rows have a relevance of 1, and the rest have a relevance of 0. If I visualize the rows with the code lines below I see that the row with a relevance of 4 is ranked 18th of 20 in terms of relevance and the the two rows of relevance 1 are ranked 5th and 14th out of 20.
If I try to hand compute the relevance score at k=20 I do not get the same NDCG@20 as the XGBoost functions.
I read in the documentation that there might be issues because not all functions take into account the qid, however in the "ex_small" sample there is only one query id so I expected to be able to replicate the NDCG by hand. Can you help me understand why this is occurring?
The text was updated successfully, but these errors were encountered: