Training iterations and training loss #451

DataCorrupted · 2025-02-25T22:12:18Z

Current training script setup (see code below) doesn't seem to care about loss.
The logic is more "Train for X iterations then stop"

ml-compiler-opt/compiler_opt/rl/trainer.py

Lines 201 to 232 in 2136716

    
           def train(self, dataset_iter, monitor_dict, num_iterations: int): 
        
             """Trains policy with data from dataset_iter for num_iterations steps.""" 
        
             self._reset_metrics() 
        
             # context management is implemented in decorator 
        
             # pytype: disable=attribute-error 
        
             # pylint: disable=not-context-manager 
        
             with tf.summary.record_if(lambda: tf.math.equal( 
        
                 self._global_step % self._summary_export_interval, 0)): 
        
               # pytype: enable=attribute-error 
        
               for _ in range(num_iterations): 
        
                 # When the data is not enough to fill in a batch, next(dataset_iter) 
        
                 # will throw StopIteration exception, logging a warning message instead 
        
                 # of killing the training when it happens. 
        
                 try: 
        
                   experience = next(dataset_iter) 
        
                 except StopIteration: 
        
                   logging.warning( 
        
                       'Warning: skip training because do not have enough data to fill ' 
        
                       'in a batch, consider increase data or reduce batch size.') 
        
                   break 
        
                 # random network distillation for intrinsic reward generation 
        
                 if self._random_network_distillation: 
        
                   experience = self._random_network_distillation.train(experience) 
        
                 loss = self._agent.train(experience) 
        
                 self._percentage_correct.reset_state() 
        
                 self._update_metrics(experience, monitor_dict) 
        
                 self._log_experiment(loss.loss) 
        
                 self._save_checkpoint()

This is somewhat problematic for us:

It takes really long to train, maybe the model is pretty good after 1/2 X iterations, yet we still have to wait.
When the model is trained, we are not so confident how good it is. It could be that the model is still not trained enough (e.g. it needs 2X iterations to learn everything), or there could be something fundamentally wrong about the data.

One solution I saw is using a moving average of the loss, stop once it is lower than a threshold (or it exhausted X iterations). |
I want to start a discussion here whether we should do that in the script?

alekh · 2025-02-25T22:31:27Z

Since this is RL training, we do not know apriori how much reward is "good enough". Unlike in classification of regression problems, where we can say what is an acceptable value of loss, that is less clear here. So having a loss-based termination might not be great, since a generic threshold is not obvious here. That said, if you have a clear threshold for your use-case, please feel free to modify in the suggested manner.

DataCorrupted · 2025-02-26T00:07:15Z

That argument doesn't make current setting "run for X iteratiorations" easier, without prior it is also unclear what "X" is appropriate. I guess the root question here is, how do we define a "good", "trained" model?
You are right that in classification or regression problems we can say loss close to 0 is good, but not the case here - I've seen negative loss during my training. What does loss represent here?

Engineering wise, maybe we can add an option "loss_threashold" such that when provided, the training is stopped after the running average loss is lower than that.

boomanaiden154 · 2025-02-26T00:16:11Z

If you're looking at the reward graphs, the reward is the percent reduction in size/percent reduction in cycles/time for inlining and regalloc respectively. A negative reward just means that what was being measured regressed and is pretty common in the initial stages of training, especially without warmstarting.

The normal way to run these scripts if time is sensitive is to start them, and then check tensorboard until you're happy with the models convergence, and then control+c the training process. It doesn't really make sense from an ML perspective to have some threshold in the gin files to stop training afterwards because you don't know beforehand how long the model is going to take to converge/what the level is. In a continuous training scenario (ie redeploying new models every x weeks/months), you would just set the total number of steps to a reasonable amount above what it takes to converge and train for that amount of time.

mtrofin · 2025-02-26T01:30:25Z

I think adding a threshold would be fine, basically why not have both ways (i.e. if folks want threshold -> fine; if not -> also fine). If it turns out that the threshold isn't really useful, doesn't seem like something that'd be hard to live with if not removed.

@DataCorrupted would that work?

Separately, about iterations - it should also be possible to set the iterations to something economical (small-ish, like few 100K); then re-start training using the model that came out of the first set of iterations as warmstart.

I'm wondering - @alekh @boomanaiden154 - would it be a bad idea to optionally automatically monitor the rate of improvement in the reward and say something like "if it's not improving more than X after some nr of iterations -> stop"?

DataCorrupted · 2025-02-26T01:35:14Z

That sounds good to me.

Combining your first two thoughts, maybe we could do "default to X iterations unless user specified an error threshold (via one flag) or # of iterations (via another flag)"

alekh · 2025-02-26T01:39:27Z

I haven't monitored your training runs carefully enough with PPO to see how continuously and monotonically the loss improves. You could put in a heuristic like this, but I'm generally concerned about more hyper parameters which get added in doing so. Alekh

…

On Tue, Feb 25, 2025, 5:30 PM Mircea Trofin ***@***.***> wrote: I think adding a threshold would be fine, basically why not have both ways (i.e. if folks want threshold -> fine; if not -> also fine). If it turns out that the threshold isn't really useful, doesn't seem like something that'd be hard to live with if not removed. @DataCorrupted <https://github.com/DataCorrupted> would that work? Separately, about iterations - it should also be possible to set the iterations to something economical (small-ish, like few 100K); then re-start training using the model that came out of the first set of iterations as warmstart. I'm wondering - @alekh <https://github.com/alekh> @boomanaiden154 <https://github.com/boomanaiden154> - would it be a bad idea to optionally automatically monitor the rate of improvement in the reward and say something like "if it's not improving more than X after some nr of iterations -> stop"? — Reply to this email directly, view it on GitHub <#451 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGIQGIJMA6IHBPHBHAQC332RUKMPAVCNFSM6AAAAABX3YAFLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBTGY3DQMBZGE> . You are receiving this because you were mentioned.Message ID: ***@***.***> [image: mtrofin]*mtrofin* left a comment (google/ml-compiler-opt#451) <#451 (comment)> I think adding a threshold would be fine, basically why not have both ways (i.e. if folks want threshold -> fine; if not -> also fine). If it turns out that the threshold isn't really useful, doesn't seem like something that'd be hard to live with if not removed. @DataCorrupted <https://github.com/DataCorrupted> would that work? Separately, about iterations - it should also be possible to set the iterations to something economical (small-ish, like few 100K); then re-start training using the model that came out of the first set of iterations as warmstart. I'm wondering - @alekh <https://github.com/alekh> @boomanaiden154 <https://github.com/boomanaiden154> - would it be a bad idea to optionally automatically monitor the rate of improvement in the reward and say something like "if it's not improving more than X after some nr of iterations -> stop"? — Reply to this email directly, view it on GitHub <#451 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGIQGIJMA6IHBPHBHAQC332RUKMPAVCNFSM6AAAAABX3YAFLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBTGY3DQMBZGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

mtrofin · 2025-02-26T01:42:41Z

Ack - I was thinking of the heuristic more long-term. Short term, though, @DataCorrupted if you have a patch for the loss_threshold, happy to review. Once you got some mileage out of it, it'd be great if you could comment on its usefulness - maybe your setup is different from what we're doing (which is basically what @boomanaiden154 said) and it'd be good to learn from your scenario.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training iterations and training loss #451

Training iterations and training loss #451

DataCorrupted commented Feb 25, 2025 •

edited

Loading

alekh commented Feb 25, 2025

DataCorrupted commented Feb 26, 2025 •

edited

Loading

boomanaiden154 commented Feb 26, 2025

mtrofin commented Feb 26, 2025

DataCorrupted commented Feb 26, 2025

alekh commented Feb 26, 2025 via email

mtrofin commented Feb 26, 2025

Training iterations and training loss #451

Training iterations and training loss #451

Comments

DataCorrupted commented Feb 25, 2025 • edited Loading

alekh commented Feb 25, 2025

DataCorrupted commented Feb 26, 2025 • edited Loading

boomanaiden154 commented Feb 26, 2025

mtrofin commented Feb 26, 2025

DataCorrupted commented Feb 26, 2025

alekh commented Feb 26, 2025 via email

mtrofin commented Feb 26, 2025

DataCorrupted commented Feb 25, 2025 •

edited

Loading

DataCorrupted commented Feb 26, 2025 •

edited

Loading