NEARL(Nanoscale Environment Assessment and Resonance Landscapes) is a 3D structural
data generation framework to featurize bio-molecules specifically focus on their 3D coordinate and protein dynamics
to make users benefit from the recent development in machine learning algorithms.
- Obtain and embed molecule blocks from 3D molecular structures
- Load arbitrary number of 3D structures into a trajectory container
- Multiple pre-defined 2D or 3D features for the featurization
- Pipeline for featurizing the trajectory container
git clone https://github.com/miemiemmmm/NEARL.git
cd NEARL
Mamba is a Python package manager implemented in C++ and aims to provide all
the functionality of Conda but with higher speed.
Micromamba is a lighter-weight version of Mamba,
aiming to provide a minimal, fast, and standalone executable for environment and package management. They both can be used as a drop-in replacement for Conda and
we recommend using micromamba to manage the python environement.
If there is no micromamba installed on your system, the following script could help you to install micromamba
# The following command downloads the micromamba to /home/myname/micromamba/bin and generates a loadmamba script
bash scripts/install_mamba.sh /home/myname/micromamba
# Use the following command to configure the shell to use micromamba.
# To load micromamba upon starting a new shell, add this line to .bashrc or .zshrc
source /home/myname/micromamba/bin/loadmamba
Load the mamba environment and create a new environment named NEARL
bash scripts/create_env_mamba.sh NEARL jax
micromamba activate NEARL
NEARL supports only the Linux platform for the time being. It is recommended to install via PyPI:
pip install nearl
By defaults, it uses OpenMP when doing feature density interpolation, there are some
key components accelerated by OpenACC. To install the GPU version, Nvidia HPC SDK
is required.
Use the following command to install the GPU version:
pip install .
Activate the new NEARL environment and run the following commands to test the installation:
# To test the featurizer:
python -c "from nearl import tests; tests.vectorize()"
# To test some simple models:
python -c "from nearl import tests; tests.jax_2dcnn()"
import nearl as nl
_trajfile, _topfile = nl.data.MINI_TRAJ
_parms = nl.data.MINI_PARMS
loader = nl.io.TrajectoryLoader(_trajfile, _topfile)
feat = nl.features.Featurizer3D(_parms)
feat.register_feature(nl.features.Mass())
......
......
NEARL regards every 3D structure as trajectories rather than separate molecules. pytraj is the backend for trajectory processing.
Trajectory loader currently supports the following formats: NetCDF, PDB, XTC, TRR, DCD, h5/hdf.
The trajectory loader normally reads trajectory/topology pairs.
from nearl import trajloader
traj_list = [traj1, traj2, traj3, ..., trajn]
top_list = [top1, top2, top3, ..., topn]
traj_loader = trajloader.TrajectoryLoader(traj_list, top_list)
trajloader = TrajectoryLoader(trajs, tops, **kwarg)
for traj in trajloader:
# Do something with traj
Single snapshot from MD or static structure (like PDB) are dealt as a trajectory with only one frame. If this is the case, you could only needs to load the structure as
from nearl import trajloader
traj_list = [pdb1, pdb2, pdb3, ..., pdbn]
traj_loader = trajloader.TrajectoryLoader(traj_list, range(len(traj_list)))
Featurizer is the primary hook between features and trajectories.
featurizer = nl.features.Featurizer3D()
......
......
feat = nl.features.Featurizer3D()
feat.register_feature(nl.features.Mass())
feat.register_frame()
......
from nearl.featurizer import Featurizer
featurizer = Featurizer()
featurizer.register_feature(YourFeature)
feat.register_traj(trajectory)
feat.register_frames(range(100))
index_selected = trajectory.top.select(":LIG")
repr_traji, features_traji = feat.run_by_atom(index_selected, focus_mode="cog")
View the example project featurizing a small subset of the PDBbind dataset in this script
When defining a new feature, you need to inherit the base class Features and implement the feature function.
from nearl.features import Features
class YourFeature(Features):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# your own initialization
def feature(self, *args, **kwargs):
# your own feature
return feature_vector
- TEMPLATE_STRING
......
......
- Since temporal features are
from nearl import hdf
with hdf.hdf_operator(output_hdffile, "r") as h5file:
h5file.draw_structure();
There are several pre-defined models in the nearl.models using
PyTorch and JAX framework.
You could easily re-use these models or write your own model.
......
......
View the example project training on a small dataset in PyTorch framework or JAX framework
from nearl import utils, io, data
config = {
":LIG<:10&!:SOL,T3P": "ribbon",
":LIG<:5&!:SOL,T3P,WAT": "line",
":LIG": "ball+stick",
}
traj = io.traj.Trajectory(*data.traj_pair_1)
traj.top.set_reference(traj[0])
dist, info = utils.dist_caps(traj, ":LIG&!@H=", ":LIG<:6&@CA,C,N,O,CB")
tv = utils.TrajectoryViewer(traj)
tv.config_display(config)
tv.add_caps(info["indices_group1"], info["indices_group2"])
tv.resize_stage(400,400)
tv.viewer
from nearl import utils, io, data