Biological networks have a key role in drug discovery for disease treatement. The study of biological networks and identifying critical nodes can facilitate identification of the key targets in treating diseases. However, efficiently finding critical nodes with low cost is challenging. Additionally different experimental works exist to identify critical nodes which are expensive and time consuming. The development of analytics including statistics and machine learning has revolutionized critical node detections [1]. The Critical Node Detection Problem (CNDP) is the optimizaiton problem that consists in finding a set of nodes, the deletion of which maximally degrades network connectivity according to some predefined connectivity metrices[2]. This project is to find a minimum cardinality set of proteins whose removal would optimally dismantle the interactions in the protein- protein network and help neutralize harmful organisms such as bacteria or viruses. Finding key players or nodes in protein-protein interaction networks can have great implications for structure based strategies of drug design. We use Finder [2] a novel frame work that combines techniques from Graph Neural Net (GNN) and Reinforcement Learning (RL) to achieve fast and effective CNDP. The original finder is writen with tensor flow (TF). We wrote Finder with pytorch to implement for our work. And then we applied a serials of ehancements, particularly for human protein-protein networks.
Reference
[1] X.Liu et al., "Coputational methods for identifying the critical nodes in biological networks", vol.21, no.2, pp. 486-497, 2020.
[2] C. Fan, L. Zeng, Y. Sun, and Y.Y. Liu, Finder supplementary, vol. 2, no.6. 2020.
Title: The Human Reference Protein Interactome Mapping Project
link: http://www.interactome-atlas.org/download
This is a PyTorch implementation of FINDER, as described in the paper:
Fan, C., Zeng, L., Sun, Y and Liu Y-Y. Finding key players in complex networks through deep reinforcement learning. Nat Mach Intell (2020).
- Overview
- Repo Contents
- System Requirements
- Installation Guide
- Reproduction instructions
- Basebline methods implementation
Finding an optimal set of nodes, called key players, whose activation (or removal) would maximally enhance (or degrade) certain network functionality, is a fundamental class of problems in network science. Potential applications include network immunization, epidemic control, drug design, and viral marketing. Due to their general NP-hard nature, those problems typically cannot be solved by exact algorithms with polynomial time complexity. Many approximate and heuristic strategies have been proposed to deal with specific application scenarios. Yet, we still lack a unified framework to efficiently solve this class of problems. Here we introduce a deep reinforcement learning framework FINDER, which can be trained purely on small synthetic networks generated by toy models and then applied to a wide spectrum of influencer finding problems. Extensive experiments under various problem settings demonstrate that FINDER significantly outperforms existing methods in terms of solution quality. Moreover, it is several orders of magnitude faster than existing methods for large networks. The presented framework opens up a new direction of using deep learning techniques to understand the organizing principle of complex networks, which enables us to design more robust networks against both attacks and failures.
- code: source code of FINDER for the following four cases in the paper.
- FINDER_CN: source code for the Critical Node (CN) problem under the node-unweighted scenario.
- FINDER_CN_cost: source code for the Critical Node (CN) problem under the node-weighted (including degree-based costs and random costs) scenarios.
- FINDER_ND: source code for the Network Dismantling (ND) problem under the node-unweighted scenario.
- FINDER_ND_CM: source code for the Network Dismantling (ND) problem under the node-unweighted scenario, with configuraion model.
- FINDER_ND_cost: source code for the Network Dismantling (ND) problem under the node-weighted (including degree-based costs and random costs) scenarios.
- FINDER_ND_cost_CM: source code for the Network Dismantling (ND) problem under the node-weighted (including degree-based costs and random costs) scenarios, with configuration model.
- [old_FINDER_CN_cost_tf],[old_FINDER_CN_tf],[old_FINDER_ND_cost_tf],[old_FINDER_ND_tf], old tensor flow versions
- results: results obtained by FINDER for all the cases, which should be the same as are reported in the paper.
Look at reuirements.txt
The package development version is tested on Linux and Windows 10 operating systems. The developmental version of the package has been tested on the following systems:
Linux: Ubuntu 18.04
Windows: 10
The pip package should be compatible with Windows, and Linux operating systems.
Before setting up the FINDER users should have gcc
version 7.4.0 or higher.
The FINDER
model requires a standard computer with enough RAM and GPU to support the operations defined by a user. For minimal performance, this will be a computer with about 4 GB of RAM and 16GB of GPU. For optimal performance, we recommend a computer with the following specs:
RAM: 16+ GB
CPU: 4+ cores, 3.3+ GHz/core
GPU: 16+ GB
The runtimes below are generated using a computer with the recommended specs (16 GB RAM, 4 [email protected] GHz) and internet of speed 25 Mbps.
We are currently working on the pytorch implementation for FINDER_ND_cost
. (look under FINDER_torch.pyx
and train_torch.py
)
- First install all the above required packages, which are contained in the requirements.txt file
pip install -r requirements.txt
- Make all the file
python setup.py build_ext -i
It took about 15 mins to install all the required packages, and about 1 mins to make all the files.
- Train the model,
CUDA_VISIBLE_DEVICES=gpu_id python train_torch.py
Modify the hyper-parameters in FINDER_torch.pyx
to tune the model, and make files after the the modification.
For old tensorflow models:
CUDA_VISIBLE_DEVICES=gpu_id python train.py
Modify the hyper-parameters in FINDER.pyx
to tune the model, and make files after the the modification.
- Test synthetic data,
CUDA_VISIBLE_DEVICES=-1 python testSynthetic.py (do not use GPU for test)
Using the well-trained model (stored in ./models
), you can obtain the results reported in the paper.
- Test real data,
CUDA_VISIBLE_DEVICES=-1 python testReal.py (do not use GPU for test)
Using the well-trained model (stored in ./models
), you can obtain the results reported in the paper.
The experimental results are saved in the results
folder, which contains four subfolders, each of which corresponds to one model, and the synthetic and real results are separated into two different subfolders for the sake of clearity.
Trainning takes about 10 hours for each model.
Test takes about <2 minutes for ND models, 3~4 minutes for ND_cost models.
We compared with HDA, HBA, HCA, HPRA, CI, MinSum, BPD, GND and RatioCut, which are state-of-the-art baselines for network key players finding methods.
We ran HDA, HBA, HCA, and HPRA with Networkx 2.0, and for HDA in large real-world networks, we instead used SNAP, a general-purpose, high-performance system for graph analysis.
http://snap.stanford.edu/snap
We used the source codes released online, and adopted the best parameter settings for each method. For RatioCut, we modified the traditional RatioCut method based on the GND implementation.
https://github.com/zhfkt/ComplexCi (CI)
https://github.com/abraunst/decycler (MinSum)
http://power.itp.ac.cn/~zhouhj/codes.html (BPD)
https://github.com/hcmidt/corehd (CoreHD)
https://github.com/renxiaolong/Generalized-Network-Dismantling (GND and RatioCut)
Here is the link to the dataset that was used in the paper, including: 1) real data: nine different test data collected from the SNAP repository; 2) synthetic data: generated by Barab'{a}si-Albert (BA) model.
https://drive.google.com/open?id=1HAxIUsgOPYXHDikmlTIKIXunGWWLXbr2
Please cite our work if you find our code/paper is useful to your work.
@article{fan2020finding,
title={Finding key players in complex networks through deep reinforcement learning},
author={Fan, Changjun and Zeng, Li and Sun, Yizhou and Liu, Yang-Yu},
journal={Nature Machine Intelligence},
pages={1--8},
year={2020},
publisher={Nature Publishing Group}
}