scDeepSort

Reference-free Cell-type Annotation for Single-cell Transcriptomics using Deep Learning with a Weighted Graph Neural Network

Recent advance in single-cell RNA sequencing (scRNA-seq) has enabled large-scale transcriptional characterization of thousands of cells in multiple complex tissues, in which accurate cell type identification becomes the prerequisite and vital step for scRNA-seq studies.

To addresses this challenge, we developed a reference-free cell-type annotation method, namely scDeepSort, using a state-of-the-art deep learning algorithm, i.e. a modified graph neural network (GNN) model. It’s the first time that GNN is introduced into scRNA-seq studies and demonstrate its ground-breaking performances in this application scenario. In brief, scDeepSort was constructed based on our weighted GNN framework and was then learned in two embedded high-quality scRNA-seq atlases containing 764,741 cells across 88 tissues of human and mouse, which are the most comprehensive multiple-organs scRNA-seq data resources to date. For more information, please refer to a preprint in bioRxiv 2020.05.13.094953.

Install

Download source codes of scDeepSort.
Download pretrained models from the release page and uncompress them.

tar -xzvf pretrained.tar.gz

After executing the above steps, the final scDeepSort tree should look like this:

 |- pretrianed
     |- human
        |- graphs
        |- statistics
        |- models
     |- mouse
        |- graphs
        |- statistics
        |- models
 |- test
     |- human
     |- mouse
 |- models
    |- __init__.py
    |- gnn.py
 |- utils
    |- __init__.py
    |- preprocess.py
 |- run.py
 |- requirements.txt
 |- README.md

Dependency

Dependencies can also be installed using pip install -r requirements.txt

Usage

Prepare test data

The file name of test data should be named in this format: species_TissueNumber_data.csv. For example, human_Pancreas11_data.csv is a data file containing 11 human pancreas cells.
The test single-cell transcriptomics csv data file should be normalized with the defalut LogNormalize method with Seurat (R package), wherein the column represents each cell and the row represent each gene, as shown below.

Cell 1 Cell 2 Cell 3 ...

Gene 1 0 2.4 5.0 ...

Gene 2 0.8 1.1 4.3 ...

Gene 3 1.8 0 0 ...

... ... ... ... ...
All the test data should be included under the test directory. Furthermore, all of the human testing datasets and mouse testing datasets are required to be under ./test/human and ./test/mouse respectively.

Run

To test one data file human_Pancreas11.csv, you should execute the following command:

python run.py --species human --tissue Pancreas --test_dataset 11 --gpu -1 --threshold 0

--species The species of cells, human or mouse.
--tissue The tissue of cells. see Details
--test_dataset The dataset to be tested, in other words, as the file naming rule states, it is exactly the number of cells in the data file.
--gpu Specify the GPU to use, -1 for cpu.
--threshold The threshold that constitutes the edge in the graph, default is 0.

Output

For each test dataset, it will output a .csv file named as species_Tissue_Number.csv under the result directory. For example, output of test dataset human_Pancreas11_data.csv is human_Pancreas_11.csv

Each line of the output file corresponds to the predictive cell type.

Details

Adipose
Adrenal_gland
Artery
Ascending_colon
Bladder
Blood
Bone_marrow
Brain
Cervix
Chorionic_villus
Colorectum
Cord_blood
Epityphlon
Esophagus
Fallopian_tube
Female_gonad
Fetal_adrenal_gland
Fetal_brain
Fetal_calvaria
Fetal_eye
Fetal_heart
Fetal_intestine
Fetal_kidney
Fetal_liver
Fetal_Lung
Fetal_male_gonad
Fetal_muscle
Fetal_pancreas
Female_gonad
Fetal_rib
Fetal_skin
Fetal_spinal_cord
Fetal_stomach
Fetal_thymus
Gall_bladder
Heart
Kidney
Liver
Lung
Muscle
Neonatal_adrenal_gland
Omentum
Pancreas
Placenta
Pleura
Prostat
Spleen
Stomach
Temporal_lobe
Thyroid
Trachea
Ureter

Bladder
Blood
Bone_marrow
Bone_Marrow_mesenchyme
Brain
Embryonic_mesenchyme
Fetal_brain
Fetal_intestine
Fetal_liver
Fetal_lung
Fetal_stomach
Intestine
Kidney
Liver
Lung
Mammary_gland
Muscle
Neonatal_calvaria
Neonatal_heart
Neonatal_muscle
Neonatal_pancreas
Neonatal_rib
Neonatal_skin
Ovary
Pancreas
Placenta
Prostate
Spleen
Stomach
Testis
Thymus
Uterus

Examples

python run.py --species human --tissue Pancreas --test_dataset 11 --gpu -1 --threshold 0

python run.py --species mouse --tissue Intestine --test_dataset 28 --gpu -1 --threshold 0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scDeepSort

Reference-free Cell-type Annotation for Single-cell Transcriptomics using Deep Learning with a Weighted Graph Neural Network

Install

Dependency

Usage

Prepare test data

Run

Output

Details

Examples

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
models		models
test		test
utils		utils
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

	Cell 1	Cell 2	Cell 3	...
Gene 1	0	2.4	5.0	...
Gene 2	0.8	1.1	4.3	...
Gene 3	1.8	0	0	...
...	...	...	...	...

License

CaiyingZhu/DeepSort

Folders and files

Latest commit

History

Repository files navigation

scDeepSort

Reference-free Cell-type Annotation for Single-cell Transcriptomics using Deep Learning with a Weighted Graph Neural Network

Install

Dependency

Usage

Prepare test data

Run

Output

Details

Examples

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages