-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Driver gene identification #77
Comments
Hey! Let me explain a little about how EWCE works since what you are asking doesn't really fit into EWCE's usual use cases. So EWCE uses reference single-cell/single-nucleus RNA-seq datasets to work the specificity of genes' expressions to specific cell types. In this sense, you can get these specificity values for each cell type:
However, going to your cell type of interest and picking the most specific genes may not be a sensible thing to do as the actual expression levels of these genes, for example, could be very low but it might just happen that the small amount of expression that was noted came from this cell type. It is worth understanding what we mean by specificity here before trying to use this data for your own problem. You want:
From what I understand, you are trying to get a list of genes that are specific to a cell type. This could be done by taking say the top 10% quantile of genes from this specificity matrix using a reference dataset from the region of interest (see here for documentation for creating your own). However note the caveats and possible issues with this I give above. Have a read of the original publication to get a better understanding of these concepts and how we use them in EWCE - https://www.frontiersin.org/articles/10.3389/fnins.2016.00016/full Cheers, |
@Al-Murphy I think what @forrestzhaosen is asking for is actually something a bit different. i.e. Given cell-type-specific enrichment for some gene list, what are the cell-type specific genes that are most strongly driving that enrichment. Since not all genes will overlap between the gene list and the genes with the top specificity quantiles, simply taking the latter wouldn't be sufficient. EWCE doesn't currently return this kind of information, but I take your point @forrestzhaosen . |
You can get the bootstrapping plots, which show the probability of each gene being there, at each position in the ranked list. There is a function in EWCE for this.
But I don’t think there is a subset of genes driving the enrichments, in most cases
Sent from Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Brian M. Schilder ***@***.***>
Sent: Thursday, March 9, 2023 12:05:26 PM
To: NathanSkene/EWCE ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [NathanSkene/EWCE] Driver gene identification (Issue #77)
This email from ***@***.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list<https://spam.ic.ac.uk/SpamConsole/Senders.aspx> to disable email stamping for this address.
@Al-Murphy<https://github.com/Al-Murphy> I think what @forrestzhaosen<https://github.com/forrestzhaosen> is asking for is actually something a bit different. i.e. Given cell-type-specific enrichment for some gene list, what are the cell-type specific genes that are most strongly driving that enrichment. Since not all genes will overlap between the gene list and the genes with the top specificity quantiles, simply taking the latter wouldn't be sufficient.
EWCE doesn't currently return this kind of information, but I take your point @forrestzhaosen<https://github.com/forrestzhaosen> .
One simply solution would be extend what @Al-Murphy<https://github.com/Al-Murphy> is suggesting and find the intersection between genes with the gene list and the genes in the top specificity quantiles for a given enriched celltype. Ideally, we would want to collect some stats on this across all bootstrap iterations. I don't think this would be too hard to implement, and might be quite useful in certain cases.
—
Reply to this email directly, view it on GitHub<#77 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH5ZPE2CXDNV5MQHAB62HG3W3HBQNANCNFSM6AAAAAAVULGEXQ>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Oh right, forgot about that function! This what more what I mean. If I'm remembering correctly though, the plots were implemented in such a way that the enrichments tests had to be rerun, which isn't ideal since it doubles compute time and isn't tied directly to your first round of enrichment results (due to the stochasticity). If am indeed remembering correctly, maybe we can add a feature to the main bootstrapping function that computes the per-gene probability and stores them in another slot within the output. |
I believe this is the function in question @NathanSkene . And here is what i mean about the bootstrapping tests being redone: EWCE/R/generate_bootstrap_plots.r Line 157 in 9b5b21f
Looking into the code now to see if we can just record these gene-wise probabilities during the bootstrap tests conducted by |
Thank you all! That's really helpful. |
@forrestzhaosen Please do include a reproducible example using the Bugs template in a new Issue. Otherwise there's not much we can do for you. In other news, making progress on implementing the built-in gene scoring for |
Thanks! That's amazing. |
Ok, so i rewrote much of Here's what the plots look like now: Reprex## Load the single cell data
sct_data <- ewceData::ctd()
## Set the parameters for the analysis
## Use 5 bootstrap lists for speed, for publishable analysis use >10000
reps <- 5
## Load the gene list and get human orthologs
hits <- ewceData::example_genelist()[1:100]
## Bootstrap significance test,
## no control for transcript length or GC content
## Use pre-computed results to speed up example
full_results <- EWCE::example_bootstrap_results()
output <- EWCE::generate_bootstrap_plots(
sct_data = sct_data,
hits = hits,
reps = reps,
full_results = full_results,
listFileName = "Example",
sctSpecies = "mouse",
genelistSpecies = "human",
annotLevel = 1,
save_dir = tempdir()
) PlotsPlot 1Plot 2Plot 3Plot 4Let me know if anything seems awry with these @NathanSkene |
actually, shouldn't the y-axes be labeled "Specificity in cell type"? Since it's the specificity matrix that it ultimately being used to generate these plots (currently, and before I touched anything) |
Note to self, I'll also revamp the |
@Al-Murphy I've pushed the changes I've made so far to @forrestzhaosen to install the dev version of EWCE use: remotes::install_github("NathanSkene/EWCE", dependencies = TRUE, upgrade = "always") |
Thank you so much! It's greatly appreciated. |
Changes to the |
Is it possible to generate a list of genes that drive the significance of the enriched cell type?
Thanks!
The text was updated successfully, but these errors were encountered: