You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I have a corpus of about 500,000 protein sequences and would like to apply them to existing models like ESM2 or this one for predicting the fitness effect of changing an amino-acid for another.
How could I add my sequences to the models referred in this repo to then use the modified model for such task? Thanks.
The text was updated successfully, but these errors were encountered:
Hi @avilella -- do you have property annotations for these 500k sequences? Or just the amino acid sequences w/ no annotation? ProteinNPT is first and foremost a model that learns a joint distribution of sequences and corresponding labels, so it is not the most adapted to your setting if there is no such label/annotation.
If no label, you may be interested in the various zero-shot baselines we have integrated in the ProteinGym benchmark.
Best,
Pascal
Hi, I have a corpus of about 500,000 protein sequences and would like to apply them to existing models like ESM2 or this one for predicting the fitness effect of changing an amino-acid for another.
How could I add my sequences to the models referred in this repo to then use the modified model for such task? Thanks.
The text was updated successfully, but these errors were encountered: