Skip to content

Deciphering driver regulators of cell fate decisions from single-cell RNA-seq data

License

Notifications You must be signed in to change notification settings

WPZgithub/CEFCON

Repository files navigation

CEFCON

CEFCON is a computational tool for deciphering driver regulators of cell fate decisions from single-cell RNA-seq data. It takes a prior gene interaction network and expression profiles from scRNA-seq data associated with a given developmental trajectory as inputs, and consists of three main components, including cell-lineage-specific gene regulatory network (GRN) construction, driver regulator identification and regulon-like gene module (RGM) identification.

Overview.png

About method

CEFCON initially employs the graph attention neural networks under a contrastive learning framework to construct reliable GRNs for specific developmental cell lineages (Fig. b). Subsequently, CEFCON characterizes gene regulatory dynamics from the perspective of network control theory and identifies the driver regulators that steer cell fate decisions (Fig. c). Moreover, CEFCON detects gene regulatory modules (i.e., RGMs) involving the identified driver regulators and measure their activities using AUCell (Fig. d).

Installation

CEFCON was originally tested on Ubuntu 20.04 with Python (3.8~3.10). We recommend run CEFCON on CUDA if possible. The following packages are required to be able to run this code:

Requirements

Optional (for performance evaluation, visualization and other analyses)

  • matplotlib(>=3.5.3)
  • matplotlib-venn(>=0.11.7)
  • seaborn(>=0.12.1)
  • palantir(==1.0.1)
  • rpy2(>=3.4.1)
  • R(>=3.6)
    • PRROC (R package)
    • slingshot (R package)
    • MAST (R package)

Install using pip

pip install git+https://github.com/WPZgithub/CEFCON.git

Using GUROBI

We recommend using GRUOBI to solve the integer linear programming (ILP) problem when identifying driver genes. GUROBI is a commercial solver that requires licenses to run. Thankfully, it provides free licenses in academia, as well as trial licenses outside academia. If there is no problem about the licenses, you need to install the gurobipy package.

If you have difficulty using GUROBI, a non-commercial solver, SCIP, will be used. But it does not ensure a successful solution.

Using GPU

We recommend using GPU. If you choose to do so, you will need to install the GPU version of PyTorch.

Usage example

Command line usage

cefcon [-h] --input_expData PATH --input_priorNet PATH [--input_genesDE PATH] \
           [--additional_edges_pct ADDITIONAL_EDGES_PCT] [--cuda CUDA] [--seed SEED] \
           [--hidden_dim HIDDEN_DIM] [--output_dim OUTPUT_DIM] [--heads HEADS] [--attention {COS,AD,SD}] \
           [--miu MIU] [--epochs EPOCHS] [--repeats REPEATS] [--edge_threshold_param EDGE_THRESHOLD_PARAM] \
           [--remove_self_loops] [--topK_drivers TOPK_DRIVERS] --out_dir OUT_DIR

Please use cefcon.py -h to view parameters information.
Please run the run_CEFCON.sh bash file for a usage example.

Input data

  • scRNA-seq data: a '.csv' file in which rows represent cells and columns represent genes, or a '.h5ad' formatted file with AnnData objects.
  • Prior gene interaction network: an edgelist formatted network file.
     We provide prior gene interaction networks for human and mouse respectively, located in /prior_data.
  • Gene differential expression level: a 'csv' file contains the log fold change of each gene.

An example of input data (i.e., the hESC dataset with 1,000 highly variable genes) are located in /example_data. All the input data in the paper can be downloaded from here.

The output results can be found in the folder ${OUT_DIR}/:

- "cell_lineage_GRN.csv": the constructed cell-lineage-specific GRN;
- "gene_embs.csv": the obtained gene embeddings;
- "driver_regulators.csv": a list of identified driver regulators;
- "RGMs.csv": a list of obtained RGMs;
- "AUCell_mtx.csv": the AUCell activity matrix of the obtained RGMs.

Package usage

Quick start by an example (Jupyter Notebook).
Please check this Notebook for scRNA-seq preprocessing.

import cefcon as cf

# Data preparation
data = cf.data_preparation(adata, prior_network)

for lineage, data_li in data.items():
    # Construct cell-lineage-specific GRN
    cefcon_GRN_model = cf.NetModel(epochs=350, repeats=3, cuda='0')
    cefcon_GRN_model.run(data_li)
    
    cefcon_results = cefcon_GRN_model.get_cefcon_results(edge_threshold_avgDegree=8)
    
    # Identify dirver regulators
    cefcon_results.gene_influence_score()
    cefcon_results.driver_regulators()

    # Identify regulon-like gene modules
    cefcon_results.RGM_activity()

Please check this Notebook for results visualization and analyses.

Citation

Please cite the following paper, if you find the repository or the paper useful.

Peizhuo Wang, Xiao Wen, Han Li, Peng Lang, Shuya Li, Yipin Lei, Hantao Shu, Lin Gao, Dan Zhao and Jianyang Zeng, A network-based framework for deciphering driver regulators of cell fate decisions from single-cell RNA-seq data, Preprint, 2023

@article{wang2023cefcon,
  title={A network-based framework for deciphering driver regulators of cell fate decisions from single-cell RNA-seq data},
  author={Wang, peizhuo and Wen, Xiao and Li, Han and Lang, Peng and Li, Shuya and Yipin, Lei and Shu, Hantao and Gao, Lin and Zhao, Dan and Zeng, Jianyang},
  journal={-},
  year={2023}
}

Bugs & Suggestions

Please contact [email protected] or raise an issue in the github repo with any questions.