-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training iterations and training loss #451
Comments
Since this is RL training, we do not know apriori how much reward is "good enough". Unlike in classification of regression problems, where we can say what is an acceptable value of loss, that is less clear here. So having a loss-based termination might not be great, since a generic threshold is not obvious here. That said, if you have a clear threshold for your use-case, please feel free to modify in the suggested manner. |
That argument doesn't make current setting "run for X iteratiorations" easier, without prior it is also unclear what "X" is appropriate. I guess the root question here is, how do we define a "good", "trained" model? Engineering wise, maybe we can add an option "loss_threashold" such that when provided, the training is stopped after the running average loss is lower than that. |
If you're looking at the reward graphs, the reward is the percent reduction in size/percent reduction in cycles/time for inlining and regalloc respectively. A negative reward just means that what was being measured regressed and is pretty common in the initial stages of training, especially without warmstarting. The normal way to run these scripts if time is sensitive is to start them, and then check tensorboard until you're happy with the models convergence, and then control+c the training process. It doesn't really make sense from an ML perspective to have some threshold in the gin files to stop training afterwards because you don't know beforehand how long the model is going to take to converge/what the level is. In a continuous training scenario (ie redeploying new models every x weeks/months), you would just set the total number of steps to a reasonable amount above what it takes to converge and train for that amount of time. |
I think adding a threshold would be fine, basically why not have both ways (i.e. if folks want threshold -> fine; if not -> also fine). If it turns out that the threshold isn't really useful, doesn't seem like something that'd be hard to live with if not removed. @DataCorrupted would that work? Separately, about iterations - it should also be possible to set the iterations to something economical (small-ish, like few 100K); then re-start training using the model that came out of the first set of iterations as warmstart. I'm wondering - @alekh @boomanaiden154 - would it be a bad idea to optionally automatically monitor the rate of improvement in the reward and say something like "if it's not improving more than X after some nr of iterations -> stop"? |
That sounds good to me. Combining your first two thoughts, maybe we could do "default to X iterations unless user specified an error threshold (via one flag) or # of iterations (via another flag)" |
I haven't monitored your training runs carefully enough with PPO to see how
continuously and monotonically the loss improves. You could put in a
heuristic like this, but I'm generally concerned about more hyper
parameters which get added in doing so.
Alekh
…On Tue, Feb 25, 2025, 5:30 PM Mircea Trofin ***@***.***> wrote:
I think adding a threshold would be fine, basically why not have both ways
(i.e. if folks want threshold -> fine; if not -> also fine). If it turns
out that the threshold isn't really useful, doesn't seem like something
that'd be hard to live with if not removed.
@DataCorrupted <https://github.com/DataCorrupted> would that work?
Separately, about iterations - it should also be possible to set the
iterations to something economical (small-ish, like few 100K); then
re-start training using the model that came out of the first set of
iterations as warmstart.
I'm wondering - @alekh <https://github.com/alekh> @boomanaiden154
<https://github.com/boomanaiden154> - would it be a bad idea to
optionally automatically monitor the rate of improvement in the reward and
say something like "if it's not improving more than X after some nr of
iterations -> stop"?
—
Reply to this email directly, view it on GitHub
<#451 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGIQGIJMA6IHBPHBHAQC332RUKMPAVCNFSM6AAAAABX3YAFLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBTGY3DQMBZGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: mtrofin]*mtrofin* left a comment (google/ml-compiler-opt#451)
<#451 (comment)>
I think adding a threshold would be fine, basically why not have both ways
(i.e. if folks want threshold -> fine; if not -> also fine). If it turns
out that the threshold isn't really useful, doesn't seem like something
that'd be hard to live with if not removed.
@DataCorrupted <https://github.com/DataCorrupted> would that work?
Separately, about iterations - it should also be possible to set the
iterations to something economical (small-ish, like few 100K); then
re-start training using the model that came out of the first set of
iterations as warmstart.
I'm wondering - @alekh <https://github.com/alekh> @boomanaiden154
<https://github.com/boomanaiden154> - would it be a bad idea to
optionally automatically monitor the rate of improvement in the reward and
say something like "if it's not improving more than X after some nr of
iterations -> stop"?
—
Reply to this email directly, view it on GitHub
<#451 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGIQGIJMA6IHBPHBHAQC332RUKMPAVCNFSM6AAAAABX3YAFLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBTGY3DQMBZGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Ack - I was thinking of the heuristic more long-term. Short term, though, @DataCorrupted if you have a patch for the |
Current training script setup (see code below) doesn't seem to care about loss.
The logic is more "Train for X iterations then stop"
ml-compiler-opt/compiler_opt/rl/trainer.py
Lines 201 to 232 in 2136716
This is somewhat problematic for us:
One solution I saw is using a moving average of the loss, stop once it is lower than a threshold (or it exhausted X iterations). |
I want to start a discussion here whether we should do that in the script?
The text was updated successfully, but these errors were encountered: