Skip to content

PyTorch implementation of FINDER - FInding key players in complex Networks through DEep Reinforcement learning (Nature Machine Intelligence)

License

Notifications You must be signed in to change notification settings

faraz2023/FINDER-pytorch

 
 

Repository files navigation

Attention: This repo is currently under construction


Introduction

Biological networks have a key role in drug discovery for disease treatement. The study of biological networks and identifying critical nodes can facilitate identification of the key targets in treating diseases. However, efficiently finding critical nodes with low cost is challenging. Additionally different experimental works exist to identify critical nodes which are expensive and time consuming. The development of analytics including statistics and machine learning has revolutionized critical node detections [1]. The Critical Node Detection Problem (CNDP) is the optimizaiton problem that consists in finding a set of nodes, the deletion of which maximally degrades network connectivity according to some predefined connectivity metrices[2]. This project is to find a minimum cardinality set of proteins whose removal would optimally dismantle the interactions in the protein- protein network and help neutralize harmful organisms such as bacteria or viruses. Finding key players or nodes in protein-protein interaction networks can have great implications for structure based strategies of drug design. We use Finder [2] a novel frame work that combines techniques from Graph Neural Net (GNN) and Reinforcement Learning (RL) to achieve fast and effective CNDP. The original finder is writen with tensor flow (TF). We wrote Finder with pytorch to implement for our work. And then we applied a serials of ehancements, particularly for human protein-protein networks.

Reference

[1] X.Liu et al., "Coputational methods for identifying the critical nodes in biological networks", vol.21, no.2, pp. 486-497, 2020.

[2] C. Fan, L. Zeng, Y. Sun, and Y.Y. Liu, Finder supplementary, vol. 2, no.6. 2020.

Data

Data source (raw)

Title: The Human Reference Protein Interactome Mapping Project

link: http://www.interactome-atlas.org/download

Seed model:

FINDER (FInding key players in complex Networks through DEep Reinforcement learning)

This is a PyTorch implementation of FINDER, as described in the paper:

Fan, C., Zeng, L., Sun, Y and Liu Y-Y. Finding key players in complex networks through deep reinforcement learning. Nat Mach Intell (2020). demo

Contents

Overview

Finding an optimal set of nodes, called key players, whose activation (or removal) would maximally enhance (or degrade) certain network functionality, is a fundamental class of problems in network science. Potential applications include network immunization, epidemic control, drug design, and viral marketing. Due to their general NP-hard nature, those problems typically cannot be solved by exact algorithms with polynomial time complexity. Many approximate and heuristic strategies have been proposed to deal with specific application scenarios. Yet, we still lack a unified framework to efficiently solve this class of problems. Here we introduce a deep reinforcement learning framework FINDER, which can be trained purely on small synthetic networks generated by toy models and then applied to a wide spectrum of influencer finding problems. Extensive experiments under various problem settings demonstrate that FINDER significantly outperforms existing methods in terms of solution quality. Moreover, it is several orders of magnitude faster than existing methods for large networks. The presented framework opens up a new direction of using deep learning techniques to understand the organizing principle of complex networks, which enables us to design more robust networks against both attacks and failures.

Repo Contents

  • code: source code of FINDER for the following four cases in the paper.
    • FINDER_CN: source code for the Critical Node (CN) problem under the node-unweighted scenario.
    • FINDER_CN_cost: source code for the Critical Node (CN) problem under the node-weighted (including degree-based costs and random costs) scenarios.
    • FINDER_ND: source code for the Network Dismantling (ND) problem under the node-unweighted scenario.
    • FINDER_ND_CM: source code for the Network Dismantling (ND) problem under the node-unweighted scenario, with configuraion model.
    • FINDER_ND_cost: source code for the Network Dismantling (ND) problem under the node-weighted (including degree-based costs and random costs) scenarios.
    • FINDER_ND_cost_CM: source code for the Network Dismantling (ND) problem under the node-weighted (including degree-based costs and random costs) scenarios, with configuration model.
    • [old_FINDER_CN_cost_tf],[old_FINDER_CN_tf],[old_FINDER_ND_cost_tf],[old_FINDER_ND_tf], old tensor flow versions
  • results: results obtained by FINDER for all the cases, which should be the same as are reported in the paper.

System Requirements

Software dependencies and operating systems

Software dependencies

Look at reuirements.txt

Operating systems

The package development version is tested on Linux and Windows 10 operating systems. The developmental version of the package has been tested on the following systems:

Linux: Ubuntu 18.04
Windows: 10

The pip package should be compatible with Windows, and Linux operating systems.

Before setting up the FINDER users should have gcc version 7.4.0 or higher.

Hardware Requirements

The FINDER model requires a standard computer with enough RAM and GPU to support the operations defined by a user. For minimal performance, this will be a computer with about 4 GB of RAM and 16GB of GPU. For optimal performance, we recommend a computer with the following specs:

RAM: 16+ GB
CPU: 4+ cores, 3.3+ GHz/core GPU: 16+ GB

The runtimes below are generated using a computer with the recommended specs (16 GB RAM, 4 [email protected] GHz) and internet of speed 25 Mbps.

Installation Guide

For pytorch:

We are currently working on the pytorch implementation for FINDER_ND_cost. (look under FINDER_torch.pyx and train_torch.py)

Instructions

  1. First install all the above required packages, which are contained in the requirements.txt file
pip install -r requirements.txt
  1. Make all the file
python setup.py build_ext -i

Typical install time

It took about 15 mins to install all the required packages, and about 1 mins to make all the files.

Reproduction instructions

Instructions to run

  1. Train the model,
CUDA_VISIBLE_DEVICES=gpu_id python train_torch.py

Modify the hyper-parameters in FINDER_torch.pyx to tune the model, and make files after the the modification.

For old tensorflow models:

CUDA_VISIBLE_DEVICES=gpu_id python train.py

Modify the hyper-parameters in FINDER.pyx to tune the model, and make files after the the modification.

  1. Test synthetic data,
CUDA_VISIBLE_DEVICES=-1 python testSynthetic.py (do not use GPU for test)

Using the well-trained model (stored in ./models), you can obtain the results reported in the paper.

  1. Test real data,
CUDA_VISIBLE_DEVICES=-1 python testReal.py (do not use GPU for test)

Using the well-trained model (stored in ./models), you can obtain the results reported in the paper.

Expected output

The experimental results are saved in the results folder, which contains four subfolders, each of which corresponds to one model, and the synthetic and real results are separated into two different subfolders for the sake of clearity.

Expected run time

Trainning takes about 10 hours for each model.

Test takes about <2 minutes for ND models, 3~4 minutes for ND_cost models.

Basebline methods implementation

We compared with HDA, HBA, HCA, HPRA, CI, MinSum, BPD, GND and RatioCut, which are state-of-the-art baselines for network key players finding methods.

We ran HDA, HBA, HCA, and HPRA with Networkx 2.0, and for HDA in large real-world networks, we instead used SNAP, a general-purpose, high-performance system for graph analysis.

http://snap.stanford.edu/snap 

We used the source codes released online, and adopted the best parameter settings for each method. For RatioCut, we modified the traditional RatioCut method based on the GND implementation.

https://github.com/zhfkt/ComplexCi (CI)
https://github.com/abraunst/decycler (MinSum)
http://power.itp.ac.cn/~zhouhj/codes.html (BPD)
https://github.com/hcmidt/corehd (CoreHD)
https://github.com/renxiaolong/Generalized-Network-Dismantling (GND and RatioCut)

Reproducing the results that reported in the FINDER paper

Here is the link to the dataset that was used in the paper, including: 1) real data: nine different test data collected from the SNAP repository; 2) synthetic data: generated by Barab'{a}si-Albert (BA) model.

https://drive.google.com/open?id=1HAxIUsgOPYXHDikmlTIKIXunGWWLXbr2

Reference

Please cite our work if you find our code/paper is useful to your work.

@article{fan2020finding,
   title={Finding key players in complex networks through deep reinforcement learning},
   author={Fan, Changjun and Zeng, Li and Sun, Yizhou and Liu, Yang-Yu},
   journal={Nature Machine Intelligence},
   pages={1--8},
   year={2020},
   publisher={Nature Publishing Group}
 }

About

PyTorch implementation of FINDER - FInding key players in complex Networks through DEep Reinforcement learning (Nature Machine Intelligence)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 59.1%
  • Cython 18.2%
  • C++ 13.3%
  • Python 9.4%