Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training iterations and training loss #451

Open
DataCorrupted opened this issue Feb 25, 2025 · 7 comments
Open

Training iterations and training loss #451

DataCorrupted opened this issue Feb 25, 2025 · 7 comments

Comments

@DataCorrupted
Copy link

DataCorrupted commented Feb 25, 2025

Current training script setup (see code below) doesn't seem to care about loss.
The logic is more "Train for X iterations then stop"

def train(self, dataset_iter, monitor_dict, num_iterations: int):
"""Trains policy with data from dataset_iter for num_iterations steps."""
self._reset_metrics()
# context management is implemented in decorator
# pytype: disable=attribute-error
# pylint: disable=not-context-manager
with tf.summary.record_if(lambda: tf.math.equal(
self._global_step % self._summary_export_interval, 0)):
# pytype: enable=attribute-error
for _ in range(num_iterations):
# When the data is not enough to fill in a batch, next(dataset_iter)
# will throw StopIteration exception, logging a warning message instead
# of killing the training when it happens.
try:
experience = next(dataset_iter)
except StopIteration:
logging.warning(
'Warning: skip training because do not have enough data to fill '
'in a batch, consider increase data or reduce batch size.')
break
# random network distillation for intrinsic reward generation
if self._random_network_distillation:
experience = self._random_network_distillation.train(experience)
loss = self._agent.train(experience)
self._percentage_correct.reset_state()
self._update_metrics(experience, monitor_dict)
self._log_experiment(loss.loss)
self._save_checkpoint()

This is somewhat problematic for us:

  1. It takes really long to train, maybe the model is pretty good after 1/2 X iterations, yet we still have to wait.
  2. When the model is trained, we are not so confident how good it is. It could be that the model is still not trained enough (e.g. it needs 2X iterations to learn everything), or there could be something fundamentally wrong about the data.

One solution I saw is using a moving average of the loss, stop once it is lower than a threshold (or it exhausted X iterations). |
I want to start a discussion here whether we should do that in the script?

@alekh
Copy link
Collaborator

alekh commented Feb 25, 2025

Since this is RL training, we do not know apriori how much reward is "good enough". Unlike in classification of regression problems, where we can say what is an acceptable value of loss, that is less clear here. So having a loss-based termination might not be great, since a generic threshold is not obvious here. That said, if you have a clear threshold for your use-case, please feel free to modify in the suggested manner.

@DataCorrupted
Copy link
Author

DataCorrupted commented Feb 26, 2025

That argument doesn't make current setting "run for X iteratiorations" easier, without prior it is also unclear what "X" is appropriate. I guess the root question here is, how do we define a "good", "trained" model?
You are right that in classification or regression problems we can say loss close to 0 is good, but not the case here - I've seen negative loss during my training. What does loss represent here?

Engineering wise, maybe we can add an option "loss_threashold" such that when provided, the training is stopped after the running average loss is lower than that.

@boomanaiden154
Copy link
Collaborator

If you're looking at the reward graphs, the reward is the percent reduction in size/percent reduction in cycles/time for inlining and regalloc respectively. A negative reward just means that what was being measured regressed and is pretty common in the initial stages of training, especially without warmstarting.

The normal way to run these scripts if time is sensitive is to start them, and then check tensorboard until you're happy with the models convergence, and then control+c the training process. It doesn't really make sense from an ML perspective to have some threshold in the gin files to stop training afterwards because you don't know beforehand how long the model is going to take to converge/what the level is. In a continuous training scenario (ie redeploying new models every x weeks/months), you would just set the total number of steps to a reasonable amount above what it takes to converge and train for that amount of time.

@mtrofin
Copy link
Collaborator

mtrofin commented Feb 26, 2025

I think adding a threshold would be fine, basically why not have both ways (i.e. if folks want threshold -> fine; if not -> also fine). If it turns out that the threshold isn't really useful, doesn't seem like something that'd be hard to live with if not removed.

@DataCorrupted would that work?

Separately, about iterations - it should also be possible to set the iterations to something economical (small-ish, like few 100K); then re-start training using the model that came out of the first set of iterations as warmstart.

I'm wondering - @alekh @boomanaiden154 - would it be a bad idea to optionally automatically monitor the rate of improvement in the reward and say something like "if it's not improving more than X after some nr of iterations -> stop"?

@DataCorrupted
Copy link
Author

That sounds good to me.

Combining your first two thoughts, maybe we could do "default to X iterations unless user specified an error threshold (via one flag) or # of iterations (via another flag)"

@alekh
Copy link
Collaborator

alekh commented Feb 26, 2025 via email

@mtrofin
Copy link
Collaborator

mtrofin commented Feb 26, 2025

Ack - I was thinking of the heuristic more long-term. Short term, though, @DataCorrupted if you have a patch for the loss_threshold, happy to review. Once you got some mileage out of it, it'd be great if you could comment on its usefulness - maybe your setup is different from what we're doing (which is basically what @boomanaiden154 said) and it'd be good to learn from your scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants