# get esm
conda create -n spurs python=3.7 pip
conda activate spurs
pip install -e .
pip install git+https://github.com/facebookresearch/esm.git
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
Download data.tar.gz
from link.
tar -xzvf data.tar.gz
Evaluation on the test sets:
# general usage, only model and test dataset should be specified.
python ./test.py experiment_path={checkpoint_path} datamodule._target_={dataset_name} data_split=test ckpt_path=best.ckpt mode=predict
while model checkpoints can be selected from data/checkpoints, and datamodule can be selected from megascale
and domainome
.
## SPURS on Megascale and ten test sets
python ./test.py experiment_path=data/checkpoints/spurs datamodule._target_=megascale data_split=test ckpt_path=best.ckpt mode=predict
### ThermoMPNN on Domainome
python ./test.py experiment_path=data/checkpoints/ThermoMPNN datamodule._target_=domainome data_split=test ckpt_path=best.ckpt mode=predict
Results on Megascale and ten test sets can be processed using convert.ipynb
An example can be found at functional_site_identification.ipynb. This is the reuslt for Fig3h in the paper (UniProt ID: P00327, PDB ID: 1QLH).
To run fitness prediction
cd combining-evolutionary-and-assay-labelled-data
export PROJECT_ROOT=$PWD/../
python run_proteingym.py
This command will use all accessible CPU cores by default. If you want to use a specific range of CPUs, such as CPU0-80, you can use:
taskset -c 0-80 python run_proteingym.py
SPURS-augmented models were built upon the Augmented models framework (Hsu et al., Nat Biotechnol, 2022). We adapted the code from the original GitHub repo (commit fdaa5bb
) and retained only the necessary files. A DDGPredictor
is added to introduce predicted ddG into the regression model.
from spurs.inference import get_SPURS, parse_pdb
# ~ 10s
model, cfg = get_SPURS('./data/checkpoints/spurs')
pdb_name = 'DOCK1_MOUSE'
pdb_path = './data/inference_example/' + pdb_name + '.pdb'
chain = 'A'
pdb = parse_pdb(pdb_path, pdb_name, chain, cfg)
# ~ 1s
result = model(pdb,return_logist=True)