UniCell Deconvolve applied to 10X Genomics Visium Gene Expression Slide of Breast Adenocarcinoma Sample
The amount of publically available high-dimensional transcriptomic data, whether bulk-RNA, single-cell, or spatial, has increased exponentially in recent years. Although available for reanalysis, published data is often used in isolation to augment novel analyses. Particularly, the problem of cell type deconvolution, either from bulk or spatial transcriptomic datasets, has been addressed by numerous methods through the use of publicly available dataset as cell type specific references. The choice of reference profile however is not always readily apparent or available, and a mismatch between reference and actual cell types may potentially confound study results.
UniCell Deconvolve (UCD) is a pre-trained deep learning model that provides context-free estimations of cell type fractions from whole transcriptome expression data for bulk, single-cell and spatial transcriptomics data. The model is trained on the world's largest fully-integrated scRNA-Seq training database, comprising 28M+ single cells spanning 840+ cell types from 899 studies to date. Extensive benchmarking shows UCD favors comperably when compared with reference-based deconvolution tools, without the need for pretraining. UCD demonstrates strong multi-task performance across a range of deconvolution challenges spanning several transcriptomic data modalities, disease types, and tissues.
Nested rectangles visualizing cell type distribution heiarchy for 28 million single cells comprising the UCD Database to-date
The UCD package offers the ability to directly integrate UCD predictions into any transcriptomics data analysis pipeline in the form of a web-based API. The package available here provides a secure and scalable connection to the latest pre-trained UCD model, built on top of Google Cloud Platform, which serves deconvolution requests. In order to access the current alpha build of UCD, we ask users to sign up for an early-access API key here. Please allow up to 24 hours to recieve a response.
Includes preprocessing and visualization capabilities. Designed to interface with the annotated dataset and scanpy workflows.
We recommend installing ucdeconvolve in a virtual environment using tools such as conda or miniconda. We suggest the following installation:
conda create -n ucdenv python=3.8 pytables jupyter jupyterlab
conda activate ucdenv
pip install ucdeconvolve
UniCell Deconvolve can be installed from pyPI into an existing python workspace. The pytables package is required and may need to be installed separately using a package manage such as conda before installing ucdeconvolve. For detailed installation instruction see documentation.
pip install ucdeconvolve
Full documentation with supporting tutorials is available here.
To demonstrate the functionality of UCD, we will perform a cell type deconvolution of a spatial gene expression section of the human lymph node, made available by 10X Genomics. We will utilize scanpy to quickly load the dataset, and then pass it into ucdeconvolve to obtain cell type predictions.
Load the ucdeconvolve package and run the "ucd.api.register()" command as shown below. Follow the instructions by inputting the required information at each step.
ucd.api.register()
Upon completion of the initial registration form, you will recieve an email at the address specified with an activation code. Copy the code and paste it back into the waiting input prompt in order to activate your account or paste the activation code into the function "ucd.api.activate(code)"
ucd.api.activate(code)
Upon completion of activation, you will recieve an emial with your user acess token. This token will be automatically appended to your current python instance if you are running ucd.api.register, otherwise you can always authenticate a new python instance with a valid api token using the function "ucd.api.authenticate"
ucd.api.authenticate(token)
import ucdeconvolve as ucd
import scanpy as sc
adata = sc.datasets.visium_sge("V1_Human_Lymph_Node")
AnnData object with n_obs × n_vars = 4035 × 36601
obs: 'in_tissue', 'array_row', 'array_col'
var: 'gene_ids', 'feature_types', 'genome'
uns: 'spatial'
obsm: 'spatial'
ucd.tl.base(adata)
Example Console Output:
2023-04-25 16:27:40,012|[UCD]|INFO: Starting UCDeconvolveBASE Run. | Timer Started.
Preprocessing Dataset | 100% (16 of 16) || Elapsed Time: 0:00:02 Time: 0:00:02
2023-04-25 16:27:43,509|[UCD]|INFO: Uploading Data | Timer Started.
2023-04-25 16:27:49,367|[UCD]|INFO: Upload Complete | Elapsed Time: 5.857 (s)
Waiting For Submission : UNKNOWN | Queue Size : 0 | \ |#| 2 Elapsed Time: 0:00:03
Waiting For Completion | 100% (4035 of 4035) || Elapsed Time: 0:00:45 Time: 0:00:45
2023-04-25 16:28:42,073|[UCD]|INFO: Download Results | Timer Started.
2023-04-25 16:28:42,817|[UCD]|INFO: Download Complete | Elapsed Time: 0.743 (s)
2023-04-25 16:28:43,466|[UCD]|INFO: Run Complete | Elapsed Time: 63.453 (s)
We can print our adata object to see what new information has been added to it. UCD appends the results of each deconvolution run into 'adata.obsm' along with column names (i.e. celltypes) and run information into 'adata.uns' under the default results stem 'ucdbase'. Depending on whether or not the split parameter was set to True or False, you will either see a single new entry into 'adata.obsm' or three entries. By default, split = True so predictions will be split into primary (non-malignat), cell lines, and primary cancer (malignant).
AnnData object with n_obs × n_vars = 4035 × 36601
obs: 'in_tissue', 'array_row', 'array_col'
var: 'gene_ids', 'feature_types', 'genome'
uns: 'spatial', 'ucdbase'
obsm: 'spatial', 'ucdbase_cancer', 'ucdbase_lines', 'ucdbase_primary', 'ucdbase_raw'
We can visualize our results by using one of the built-in plotting functions in UCD, which wrap scanpy's plotting API.
ucd.pl.spatial(adata, color = "germinal center b cell")
Predicted germinal center b cell distribution across lymph node section