Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different unique pair number for SetFitTrainer.train and Trainer.hyperparameter_search with same args #545

Open
HexadimensionalerAlp opened this issue Jul 25, 2024 · 0 comments

Comments

@HexadimensionalerAlp
Copy link

Hi, I trained a model with SetFitTrainer and afterwards started a hyperparameter optimization with the same parameters for testing reasons. The expected behaviour would be both of them having the same number of unique pairs and therefore taking roughly the same time. But in reality the direkt train approach had 64240 unique pairs, 4015 optimization steps and took 30 minutes per epoch, while the optimization had 2039350 unique pairs, 127460 optimization steps and was about too take 19 hours.

Training task:

model = SetFitModel.from_pretrained(
    'sentence-transformers/paraphrase-mpnet-base-v2',
    multi_target_strategy='multi-output'
)

trainer = SetFitTrainer(
    model=model,
    train_dataset=datasets['train'],
    eval_dataset=datasets['validation'],
    loss_class=CosineSimilarityLoss,
    batch_size=16,
    num_iterations=20,
    num_epochs=1
)

trainer.train()

Optimization task:

def model_init(params: Dict[str, Any]) -> SetFitModel:
    params = params or {}
    max_iter = params.get('max_iter', 100)
    solver = params.get('solver', 'liblinear')
    params = {
        'head_params': {
            'max_iter': max_iter,
            'solver': solver
        }
    }
    
    return SetFitModel.from_pretrained('sentence-transformers/paraphrase-mpnet-base-v2', multi_target_strategy='multi-output')


def hp_space(trial: Trial) -> Dict[str, Union[float, int, str]]:
    return {
        "body_learning_rate": trial.suggest_float("body_learning_rate", 1e-5, 1e-5, log=True),
        "num_epochs": trial.suggest_int("num_epochs", 1, 1),
        "batch_size": trial.suggest_categorical("batch_size", [16]),
        "seed": trial.suggest_int("seed", 42, 42),
        "max_iter": trial.suggest_int("max_iter", 20, 20),
        "solver": trial.suggest_categorical("solver", ["liblinear"]),
    }


trainer = Trainer(
    train_dataset=datasets['train'],
    eval_dataset=datasets['validation'],
    model_init=model_init
)

best_run = trainer.hyperparameter_search(direction="maximize", hp_space=hp_space, n_trials=1)

The data consists of datasets with the columns 'text' and 'label', where 'text' is a string and 'label' a tensor of following format: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]. Although that should not be relevant for this issue.

In my understanding, both of them should be comparable in complexity of the training task as the used parameters are the same. What ist the explanation for this behaviour and is there a possibility to recreate the situation in the training task in the optimization task?

Thank you in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant