Protein Graph Library
This package provides functionality for producing a number of types of graph-based representations of proteins. We provide compatibility with standard formats, as well as graph objects designed for ease of use with popular deep learning libraries.
from graphein.construct_graphs import ProteinGraph
from graphein.construct_meshes import ProteinMesh
# Initialise ProteinGraph class
pg = ProteinGraph(granularity='CA', insertions=False, keep_hets=True,
node_featuriser='meiler', get_contacts_path='/Users/arianjamasb/github/getcontacts',
pdb_dir='examples/pdbs/',
contacts_dir='examples/contacts/',
exclude_waters=True, covalent_bonds=False, include_ss=True)
# Create residue-level graphs. Chain selection is either 'all' or a list e.g. ['A', 'B', 'D'] specifying the polypeptide chains to capture
# DGLGraph From PDB Accession Number
graph = pg.dgl_graph_from_pdb_code('3eiy', chain_selection='all')
# DGLGraph From PDB file
graph = pg.dgl_graph_from_pdb_file('examples/pdbs/3eiy.pdb', chain_selection='all')
# Create atom-level graphs
graph = pg.make_atom_graph(pdb_code='3eiy', graph_type='bigraph')
# Initialise ProteinMesh class
pm = ProteinMesh()
# Pytorch3D Mesh Object from PDB Code
verts, faces, aux = pm.create_mesh(pdb_code='3eiy', out_dir='examples/meshes/')
# Pytorch3D Mesh Object from PDB File
verts, faces, aux = pm.create_mesh(pdb_file='examples/pdbs/pdb3eiy.pdb')
Graphs can be constructed according to walks through the graph in the figure below.
granularity: {'CA', 'CB', 'atom'} - specifies node-level granularity of graph
insertions: bool - keep atoms with multiple insertion positions
keep_hets: bool - keep hetatoms
node_featuriser: {'meiler', 'kidera'} low-dimensional embeddings of AA physico-chemical properties
pdb_dir: path to pdb files
contacts_dir: path to contact files generated by get_contacts
get_contacts_path: path to GetContacts installation
exclude_waters: bool - retain structural waters
covalent_bonds: bool - maintain covalent bond edges or just use intramolecular interactions
include_ss: bool - calculate protein SS and surface features using DSSP and assign them as node features
-
Create env
conda create --name graphein python=3.7 conda activate graphein
-
Install Get Contacts
# Install get_contact_ticc.py dependencies $ conda install scipy numpy scikit-learn matplotlib pandas cython seaborn $ pip install ticc==0.1.4 # Install vmd-python dependencies $ conda install netcdf4 numpy pandas seaborn expat tk=8.5 # Alternatively use pip $ brew install netcdf pyqt # Assumes https://brew.sh/ is installed # Install vmd-python library $ conda install vmd-python # Set up getcontacts library $ git clone https://github.com/getcontacts/getcontacts.git $ echo "export PATH=`pwd`/getcontacts:\$PATH" >> ~/.bash_profile $ source ~/.bash_profile # Test installation $ cd getcontacts/example/5xnd $ get_dynamic_contacts.py --topology 5xnd_topology.pdb \ --trajectory 5xnd_trajectory.dcd \ --itypes hb \ --output 5xnd_hbonds.tsv
-
Install Biopython
conda install biopython
-
Install RDKit
conda install -c conda-forge rdkit==2018.09.3
-
Install DSSP
We use DSSP for computing some protein features
$ conda install -c salilab dssp
-
Install PyTorch Geometric
$ pip install torch-scatter $ pip install torch-sparse $ pip install torch-spline-conv $ pip install torch-cluster $ pip install torch-geometric
N.B. Follow the instructions in the Torch-Geometric Docs to install the versions appropriate to your CUDA version.
-
Install IPyMol
$ git clone https://github.com/cxhernandez/ipymol $ cd ipymol $ pip install .
N.B. The PyPi package seems to be behind the github repo. We require functionality that is not present in the PyPi package.
-
Install DGL and associated libraries
$ conda install -c dglteam dgl $ conda install -c dglteam dgllife
-
Install graphein
$ git clone https://www.github.com/a-r-j/graphein $ cd graphein $ pip install -e .
Please consider citing graphein if it proves useful in your work.
@article{Jamasb2020,
doi = {10.1101/2020.07.15.204701},
url = {https://doi.org/10.1101/2020.07.15.204701},
year = {2020},
month = jul,
publisher = {Cold Spring Harbor Laboratory},
author = {Arian Rokkum Jamasb and Pietro Lio and Tom Blundell},
title = {Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Protein Structures}
}