-
Notifications
You must be signed in to change notification settings - Fork 387
SFT Loss unable to decrease on MATH data #482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'd say this looks normal @yxchng! The loss goes down very slowly for SFT, the biggest delta is seen in between epochs. This screenshot is on of our OLMo 2 13B Instruct SFT runs for 2 epochs over the SFT data -- see the number of steps. |
@natolambert I am seeing the same too, just to confirm, expected behavior? I assume bc sum loss right? |
@berserkr can you share more -- the loss doesn't go down a ton in IFT? Share plots? |
@natolambert here is a loss for one the models I am testing. It is an MOE variant. |
Hey @berserkr and others here. A few things.
|
@natolambert I will wait for it to complete and perhaps try another epoch, I am using the default 70b parameters. I see MMLU for example go from 67.7 to 67.8, very little improvement. I will complete another full run before I report more :) Thank you! |
Also @berserkr MMLU is a tricky eval for SFT. Something more specific is usually easier to debug :) |
Uh oh!
There was an error while loading. Please reload this page.
I am trying to do SFT following this doc, https://github.com/allenai/open-instruct/blob/main/docs/tulu3.md.
My data are formatted as follow:
However my loss are as follows and not decreasing:
I am completely following tulu3 sft launch command as follows:
Can I know if I am doing anything wrong?
The text was updated successfully, but these errors were encountered: