Skip to content

An automated Python pipeline designed to extract dynamic features from large ensembles of molecular dynamics (MD) trajectories

License

Notifications You must be signed in to change notification settings

miemiemmmm/Nearl

Repository files navigation

NEARL

NEARL(Nanoscale Environment Assessment and Resonance Landscapes) is a 3D structural data generation framework to featurize bio-molecules specifically focus on their 3D coordinate and protein dynamics to make users benefit from the recent development in machine learning algorithms.

  • Obtain and embed molecule blocks from 3D molecular structures
  • Load arbitrary number of 3D structures into a trajectory container
  • Multiple pre-defined 2D or 3D features for the featurization
  • Pipeline for featurizing the trajectory container

NEARL

Installation


Clone the repository

git clone https://github.com/miemiemmmm/NEARL.git
cd NEARL

Manage your python environment

Mamba is a Python package manager implemented in C++ and aims to provide all the functionality of Conda but with higher speed. Micromamba is a lighter-weight version of Mamba, aiming to provide a minimal, fast, and standalone executable for environment and package management. They both can be used as a drop-in replacement for Conda and we recommend using micromamba to manage the python environement.
If there is no micromamba installed on your system, the following script could help you to install micromamba

# The following command downloads the micromamba to /home/myname/micromamba/bin and generates a loadmamba script
bash scripts/install_mamba.sh /home/myname/micromamba 

# Use the following command to configure the shell to use micromamba. 
# To load micromamba upon starting a new shell, add this line to .bashrc or .zshrc
source /home/myname/micromamba/bin/loadmamba

Create a test environment

Load the mamba environment and create a new environment named NEARL

bash scripts/create_env_mamba.sh NEARL jax
micromamba activate NEARL

Install NEARL

NEARL supports only the Linux platform for the time being. It is recommended to install via PyPI:
pip install nearl
By defaults, it uses OpenMP when doing feature density interpolation, there are some key components accelerated by OpenACC. To install the GPU version, Nvidia HPC SDK is required.
Use the following command to install the GPU version:


pip install .

Test the installation

Activate the new NEARL environment and run the following commands to test the installation:

# To test the featurizer: 
python -c "from nearl import tests; tests.vectorize()"
# To test some simple models:
python -c "from nearl import tests; tests.jax_2dcnn()"  

Get started


import nearl as nl
_trajfile, _topfile = nl.data.MINI_TRAJ
_parms = nl.data.MINI_PARMS
loader = nl.io.TrajectoryLoader(_trajfile, _topfile)
feat = nl.features.Featurizer3D(_parms)
feat.register_feature(nl.features.Mass())
......
......

Trajectory loader


Load structures into trajectory container

NEARL regards every 3D structure as trajectories rather than separate molecules. pytraj is the backend for trajectory processing.

Trajectory loader currently supports the following formats: NetCDF, PDB, XTC, TRR, DCD, h5/hdf.
The trajectory loader normally reads trajectory/topology pairs.

from nearl import trajloader
traj_list = [traj1, traj2, traj3, ..., trajn]
top_list = [top1, top2, top3, ..., topn]
traj_loader = trajloader.TrajectoryLoader(traj_list, top_list)
trajloader = TrajectoryLoader(trajs, tops, **kwarg)
for traj in trajloader:
  # Do something with traj

Static structures

Single snapshot from MD or static structure (like PDB) are dealt as a trajectory with only one frame. If this is the case, you could only needs to load the structure as

from nearl import trajloader
traj_list = [pdb1, pdb2, pdb3, ..., pdbn]
traj_loader = trajloader.TrajectoryLoader(traj_list, range(len(traj_list)))

Featurizer


Featurizer is the primary hook between features and trajectories.

Load trajectories to a container and register to a featurizer

featurizer = nl.features.Featurizer3D()
......
......

Start featurization

feat = nl.features.Featurizer3D()
feat.register_feature(nl.features.Mass())
feat.register_frame()
......

Register a feature to featurizer

from nearl.featurizer import Featurizer
featurizer = Featurizer()
featurizer.register_feature(YourFeature)
feat.register_traj(trajectory)
feat.register_frames(range(100))
index_selected = trajectory.top.select(":LIG")
repr_traji, features_traji = feat.run_by_atom(index_selected, focus_mode="cog")

View the example project featurizing a small subset of the PDBbind dataset in this script

Write your own feature

When defining a new feature, you need to inherit the base class Features and implement the feature function.

from nearl.features import Features
class YourFeature(Features): 
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # your own initialization
    def feature(self, *args, **kwargs):
        # your own feature
        return feature_vector

Feature data deposition


NEARL supports the following features

  • TEMPLATE_STRING
......
......

Draw the hdf structure

  • Since temporal features are
from nearl import hdf 
with hdf.hdf_operator(output_hdffile, "r") as h5file:
    h5file.draw_structure();

Model training


There are several pre-defined models in the nearl.models using PyTorch and JAX framework. You could easily re-use these models or write your own model.

......
......

View the example project training on a small dataset in PyTorch framework or JAX framework

Visualize the trajectory

from nearl import utils, io, data
config = {
  ":LIG<:10&!:SOL,T3P": "ribbon", 
  ":LIG<:5&!:SOL,T3P,WAT": "line", 
  ":LIG": "ball+stick", 
}

traj = io.traj.Trajectory(*data.traj_pair_1)
traj.top.set_reference(traj[0])

dist, info = utils.dist_caps(traj, ":LIG&!@H=", ":LIG<:6&@CA,C,N,O,CB")
tv = utils.TrajectoryViewer(traj)
tv.config_display(config)
tv.add_caps(info["indices_group1"], info["indices_group2"])
tv.resize_stage(400,400)
tv.viewer

Visualize voxelized feature and the molecule block

from nearl import utils, io, data

License


MIT License

About

An automated Python pipeline designed to extract dynamic features from large ensembles of molecular dynamics (MD) trajectories

Resources

License

Stars

Watchers

Forks