You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for your great work, but when I try to evaluate 'models--allenai--Llama-3.1-Tulu-3-8B' on MATH dataset using this codebase, the accuracy is just all zero. Is there some format mismatch on the evaluation? (Using the same model, I can obtain ~88% GSM8k exact match rate)
I guess the problem is caused by the stop_strings:
if not args.use_chat_format:
stop_strings += ["\n"]
comment it can obtain ~45% result
The text was updated successfully, but these errors were encountered:
Hi, thanks for your great work, but when I try to evaluate 'models--allenai--Llama-3.1-Tulu-3-8B' on MATH dataset using this codebase, the accuracy is just all zero. Is there some format mismatch on the evaluation? (Using the same model, I can obtain ~88% GSM8k exact match rate)
I guess the problem is caused by the stop_strings:
comment it can obtain ~45% result
The text was updated successfully, but these errors were encountered: