forked from facebookresearch/esm
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Code release for "Language Models Generalize Beyond Natural Proteins". (
facebookresearch#527) Co-authored-by: kwanUm <[email protected]> Co-authored-by: Tom Sercu <[email protected]>
- Loading branch information
1 parent
c7ba180
commit d6a2a8b
Showing
33 changed files
with
3,475 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# LM design examples | ||
|
||
This folder contains code for demonstration of protein design using a language model. The code was used to perform the two design tasks specified at the paper [Language models generalize beyond natural proteins | ||
](https://www.biorxiv.org/content/10.1101/2022.12.21.521521v1). | ||
|
||
|
||
## Notebook examples | ||
|
||
Refer to the two notebooks at this folder to run the fixed backbone and free generation design tasks. | ||
|
||
|
||
## Shell examples | ||
|
||
To run the two design tasks from shell, do the following: | ||
|
||
1. First, install additional requirements: ```pip install -r additional_requirements.txt``` | ||
2. Running Fixed backbone design: ```python -m lm_design task=fixedbb pdb_fn=$PWD/2N2U.pdb``` | ||
3. Running Free generation design: ```python -m lm_design task=free_generation``` | ||
|
||
Notes: | ||
Use the ```seed=<number>``` flag to generate different designs, e.g: | ||
```python -m lm_design task=free_generation seed=42``` | ||
|
||
Control generated length in free generation using ```free_generation_length=<number>```, e.g: | ||
```python -m lm_design task=free_generation free_generation_length=68``` | ||
|
||
Other, more advanced configurations can be observed at [config.yaml](conf/config.yaml) | ||
|
||
|
||
## Paper data | ||
The data from the preprint is available under [paper-data/](paper-data). | ||
This includes designed sequences, their predicted structures, experimental validation results, linear projection for pairwise distance prediction, and details on dataset construction for model training. |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
nltk | ||
py3Dmol | ||
hydra |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# Copyright (c) Meta Platforms, Inc. and affiliates. | ||
# | ||
# This source code is licensed under the MIT license found in the | ||
# LICENSE file in the root directory of this source tree. | ||
# | ||
seed: 0 | ||
num_seqs: 1 | ||
test_mode: False | ||
allow_missing_residue_coords: True | ||
suppress_AA: 'C' | ||
disable_cuda: False | ||
cuda_device_idx: # Set to numberic value to override default GPU device used. | ||
task: free_generation # fixedbb or free_generation | ||
pdb_fn: # set as empty string when using free_generation | ||
free_generation_length: 100 | ||
|
||
tasks: | ||
free_generation: | ||
num_iter: 170000 | ||
resample_y_every: 3 | ||
resample_y_temp: 1 | ||
stage_fixedbb_args: ${tasks.fixedbb} | ||
|
||
|
||
fixedbb: | ||
num_iter: 170000 | ||
|
||
# Accept/Reject | ||
accept_reject: | ||
energy_cfg: | ||
struct_w: 3 | ||
LM_w: 2 | ||
ngram_w: 1 | ||
ngram_orders: [1,2,3] | ||
temperature: | ||
scheduler: StepLR | ||
step_size: 10000 | ||
gamma: 0.5 | ||
initial: 8 | ||
|
||
|
||
|
||
# Hydra config | ||
hydra: | ||
job_logging: | ||
formatters: | ||
colorlog: | ||
datefmt: "%m-%d %H:%M:%S" | ||
handlers: | ||
file: | ||
class: logging.FileHandler | ||
mode: w | ||
filename: logging.l | ||
console: | ||
class: logging.StreamHandler | ||
stream: ext://sys.stdout | ||
|
||
hydra_logging: | ||
handlers: | ||
console: | ||
class: logging.StreamHandler | ||
stream: ext://sys.stdout |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,197 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "4f84941e", | ||
"metadata": {}, | ||
"source": [ | ||
"# Fixed Backbone design from LM\n", | ||
"\n", | ||
"This notebook demonstrates the Fixed Backbone design task from the paper [Language models generalize beyond natural proteins\n", | ||
"](https://www.biorxiv.org/content/10.1101/2022.12.21.521521v1).\n", | ||
"\n", | ||
"Given an input structure as .pdb file, the LM is used iteratively in an MCMC optimization to find a sequence that folds to that structure\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "d378b7f4-0792-446b-9e95-f7025bee5bec", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# First install additional dependencies\n", | ||
"!pip install -r additional_requirements.txt\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "cfd13d6a", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Imports\n", | ||
"import os\n", | ||
"import time\n", | ||
"import hydra\n", | ||
"import py3Dmol\n", | ||
"from lm_design import Designer\n", | ||
"\n", | ||
"# Params\n", | ||
"pdb_fn = os.getcwd() + '/2N2U.pdb'\n", | ||
"seed = 0 # Use different seeds to get different sequence designs for the same structure\n", | ||
"TASK = \"fixedbb\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "989996bf", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Load hydra config from config.yaml\n", | ||
"with hydra.initialize_config_module(config_module=\"conf\"):\n", | ||
" cfg = hydra.compose(\n", | ||
" config_name=\"config\", \n", | ||
" overrides=[\n", | ||
" f\"task={TASK}\", \n", | ||
" f\"seed={seed}\", \n", | ||
" f\"pdb_fn={pdb_fn}\", \n", | ||
" # 'tasks.fixedbb.num_iter=100' # DEBUG - use a smaller number of iterations\n", | ||
" ])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "63178538", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Create a designer from configuration\n", | ||
"des = Designer(cfg, pdb_fn)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "86d25575", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"# Run the designer\n", | ||
"start_time = time.time()\n", | ||
"des.run_from_cfg()\n", | ||
"print(\"finished after %s hours\", (time.time() - start_time) / 3600)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "d6d9f742", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"print(\"Output seq:\", des.output_seq)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "ba6c8c66", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"# Fold output with ESMFold API\n", | ||
"output_seq = des.output_seq\n", | ||
"# Fold with api:\n", | ||
"# curl -X POST --data \"GENGEIPLEIRATTGAEVDTRAVTAVEMTEGTLGIFRLPEEDYTALENFRYNRVAGENWKPASTVIYVGGTYARLCAYAPYNSVEFKNSSLKTEAGLTMQTYAAEKDMRFAVSGGDEVWKKTPTANFELKRAYARLVLSVVRDATYPNTCKITKAKIEAFTGNIITANTVDISTGTEGSGTQTPQYIHTVTTGLKDGFAIGLPQQTFSGGVVLTLTVDGMEYSVTIPANKLSTFVRGTKYIVSLAVKGGKLTLMSDKILIDKDWAEVQTGTGGSGDDYDTSFN\" https://api.esmatlas.com/foldSequence/v1/pdb/\n", | ||
"import requests\n", | ||
"import json\n", | ||
"url = 'https://api.esmatlas.com/foldSequence/v1/pdb/'\n", | ||
"r = requests.post(url, data=output_seq)\n", | ||
"output_struct = r.text\n", | ||
"\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "d5c06ab3", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Visualize output structure\n", | ||
"view = py3Dmol.view(width=800, height=800)\n", | ||
"view.addModel(output_struct, 'pdb')\n", | ||
"view.setStyle({'cartoon': {'color': 'spectrum'}})\n", | ||
"view.zoomTo()\n", | ||
"view.show()\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b7247225", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"des.x_logits.shape" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "d8e5c184", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Visualize wild type structure\n", | ||
"wt_struct_file = pdb_fn\n", | ||
"view = py3Dmol.view(width=800, height=800)\n", | ||
"view.addModel(open(wt_struct_file).read(), 'pdb')\n", | ||
"view.setStyle({'cartoon': {'color': 'spectrum'}})\n", | ||
"view.zoomTo()\n", | ||
"view.show()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "222ec344", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.7.12" | ||
}, | ||
"vscode": { | ||
"interpreter": { | ||
"hash": "5502aca739f2549ad2771378ffc455b2bbb8b06f1a91617971f7097758a3cf84" | ||
} | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.