Github repository of the study:
Inferring Off-Target effects of drugs on cellular signaling using Interactome-Based deep learning
Nikolaos Meimetis1, Douglas A. Lauffenburger1, Avlant Nilsson1,2,3*
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Cell and Molecular Biology, SciLifeLab, Karolinska Institutet, Sweden
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, SE 41296, Sweden
- Corresponding author, [email protected]
doi: https://doi.org/10.1016/j.isci.2024.109509
This repository is administered by @NickMeim. For questions contact [email protected]
Trained models of this study are too big to be uploaded here and are available upon reasonable request. Supplementary Data File 1.xlsx and Supplementary Data File 2.xlsx is in the results folder
Ensembles of 50 models are trained for 33 cell lines in the L1000 dataset and are available here:
Many diseases emerge from dysregulated cellular signaling, and drugs are often designed to target specific nodes in cellular networks e.g. signaling proteins, or transcription factors. However, off-target effects are common and may ultimately result in failed clinical trials. Computational modeling of the cell’s transcriptional response to drugs could improve our understanding of their mechanisms of action. Here we develop such an approach based on ensembles of artificial neural networks, that simultaneously infer drug-target interactions and their downstream effects on intracellular signaling. Applied to gene expression data from different cell lines, it outperforms basic machine learning approaches in predicting transcription factors’ activity, while recovering most known drug-target interactions and inferring many new, which we validate in an independent dataset. As a case study, we explore the inferred interactions of the drug Lestaurtinib and its effects on downstream signaling. Beyond its intended target (FLT3) the model predicts an inhibition of CDK2 that enhances downregulation of the cell cycle-critical transcription factor FOXM1, corroborating literature findings. Our approach can therefore enhance our understanding of drug signaling for therapeutic design.
The current repository contains code for:
- Initial evaluation of the quality and preprocessing of the data.
- Training and fitting of ANN and other models.
- Evaluation of the predictions of various models.
- Network construction of the MoA of off-target effects of drugs.
- Drug-target interaction inference.
- Code to re-create the results of the research article.
To run your own case study follow the instructions in each folder (there are user friendly scripts explained in the README files of each folder) :
- First visit the preprocessing folder.
- Then visit the learning folder.
- Then visit the postprocessing folder.
- Finally visit the MoA folder.
The transcriptomic signatures (level 3 and level 5 profiles) of the L1000 CMap resource1 are used for this study, together with data from the Bioconductor resource2.
The transcriptomic profiles were generated by measuring 978 important (landmark) genes in cancer with a Luminex bead-based assay and computationally inferring the rest1.
Details on how to access these data can be found in the data folder, but generally the main resources can be accessed in GEO: GSE92742
- article_supplementary_info : Folder containing code to re-create the supplementary figures and tables of the article
- data : Folder that should contain the retrieved raw data of the study.
- figures : Folder containing the scripts to produce the figures of the study.
- learning : Folder containing deep learning and machine learning algorithms and models.
- preprocessing : Folder containing scripts to pre-process the raw data and evaluate their quality.
- preprocessed_data : Here the pre-processed data to be used in the subsequent analysis are stored.
- results : Here the results of a subsequent analysis should be stored. Here you can also find all the inferred interactions in Supplementary Data File 1.xlsx
- postprocessing : Folder containing scripts to evaluate models' results and predictions.
- MoA : Folder containing code and data to construct the MoA of off-target effects.
The study utilizes multiple resources from the Python and R programming languages.
Important Note:
- This installation has been validated to work in Unix-based, macOS, and WINDOWS operating systems.
- For a Linux installation there might be needed some manual installation of external dependencies (especially) for tidyverse. Please check libraries' documentation online
- Please note that macOS are not compatible with the GPU components of this installation guide (which are not necessary though!).
Python installation
# After installing anaconda create a conda environment:
conda create -n DTLembas
conda activate DTLembas
conda install -c conda-forge rdkit
conda install -c conda-forge scikit-learn
pip install networkx
# For general (CPU) pytorch version run the following
# Otherwise for GPU installation run for your own cuda version this command: conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
conda install pytorch torchvision torchaudio -c pytorch
conda install captum -c pytorch
R installation Install R studio, open it, and run:
> if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
> BiocManager::install(c("cmapR","rhdf5","dorothea","org.Hs.eg.db","hgu133a.db"))
> if (!require("tidyverse", quietly = TRUE))
install.packages("tidyverse")
> if (!require("ggplot2", quietly = TRUE))
install.packages("ggplot2")
> install.packages("ggrepel")
> install.packages("ggpubr")
> install.packages("doRNG")
> install.packages("doFuture")
Alternatively, use conda and always use R from the terminal:
conda create -n DTLembas_r_env
conda activate DTLembas_r_env
conda install -c r r-essentials
conda install r-BiocManager
conda install conda-forge::r-ggrepel
conda install r-ggpubr
conda install r-doRNG
conda install r-doFuture
R()
BiocManager::install(c("cmapR","rhdf5","dorothea","org.Hs.eg.db","hgu133a.db"))
R dependencies: You can check the list below and manually install your preferences.
In a quick overview, the following R libraries and versions (although any version of the following libraries is appropriate) were/are used to produce the figures and results of the study:
- R version 4.1.2
- tidyverse 1.3.1
- BiocManager 1.30.16
- cmapR 1.4.0
- org.Hs.eg.db 3.13.0
- rhdf5 2.36.0
- doFuture 0.12.0
- doRNG 1.8.2
- ggplot2 3.3.5
- ggpubr 0.4.0
- GeneExpressionSignature 1.38.0
- caret 6.0-94
- ggpubr 0.6.0
- ggpattern 1.1.0
- ggridges 0.5.4
- ggrepel 0.9.3
- rstatix 0.7.2
- patchwork 1.1.2.9000
- dorothea 1.4.2
- AnnotationDbi 1.54.1
- PharmacoGx 2.4.0
- GEOquery 2.60.0
- hgu133a.db 3.13.0
- limma 3.48.3
- affy 1.70.0
- dbparser 2.0.1
Python dependencies: First, install conda (anaconda) environment on your computer, and then you can use the commands in a bash-terminal after the list of libraries.
In a quick overview, the following Python libraries and versions (although different versions are POSSIBLY also appropriate) were/are used:
- python 3.8.8
- seaborn 0.11.2 (version does not matter for this library)
- numpy 1.20.3 (version does not matter for this library)
- pandas 1.3.5 (version does not matter for this library)
- matplotlib 3.5.1 (version does not matter for this library)
- scipy 1.7.3
- scikit-learn 1.0.2
- networkx 2.6.3
- rdkit 2021.03.5
- captum 0.5.0
- pytorch 1.12.0
Footnotes
-
Subramanian, Aravind, et al. "A next generation connectivity map: L1000 platform and the first 1,000,000 profiles." Cell 171.6 (2017): 1437-1452. ↩ ↩2
-
Gentleman, Robert C., et al. "Bioconductor: open software development for computational biology and bioinformatics." Genome biology 5.10 (2004): 1-16. ↩