You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I trained a model with SetFitTrainer and afterwards started a hyperparameter optimization with the same parameters for testing reasons. The expected behaviour would be both of them having the same number of unique pairs and therefore taking roughly the same time. But in reality the direkt train approach had 64240 unique pairs, 4015 optimization steps and took 30 minutes per epoch, while the optimization had 2039350 unique pairs, 127460 optimization steps and was about too take 19 hours.
The data consists of datasets with the columns 'text' and 'label', where 'text' is a string and 'label' a tensor of following format: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]. Although that should not be relevant for this issue.
In my understanding, both of them should be comparable in complexity of the training task as the used parameters are the same. What ist the explanation for this behaviour and is there a possibility to recreate the situation in the training task in the optimization task?
Thank you in advance!
The text was updated successfully, but these errors were encountered:
Hi, I trained a model with SetFitTrainer and afterwards started a hyperparameter optimization with the same parameters for testing reasons. The expected behaviour would be both of them having the same number of unique pairs and therefore taking roughly the same time. But in reality the direkt train approach had 64240 unique pairs, 4015 optimization steps and took 30 minutes per epoch, while the optimization had 2039350 unique pairs, 127460 optimization steps and was about too take 19 hours.
Training task:
Optimization task:
The data consists of datasets with the columns 'text' and 'label', where 'text' is a string and 'label' a tensor of following format: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]. Although that should not be relevant for this issue.
In my understanding, both of them should be comparable in complexity of the training task as the used parameters are the same. What ist the explanation for this behaviour and is there a possibility to recreate the situation in the training task in the optimization task?
Thank you in advance!
The text was updated successfully, but these errors were encountered: