Pando Not Returning Expected GRNs and Limiting Target Genes to a Handful - Possibility of Problem with Mouse Genome? #30

Sandman-1 · 2023-01-31T18:54:49Z

Good afternoon! I have been sporadically tinkering with my GRN analysis for some time, and I am running into various problems. For starters, the number of unique target genes detected by Pando is less than 10 in my dataset, even though I have adjusted all parameters to allow for 0 correlation and a p value as insignificant as 1 for all steps. When I look at the modules, fit, and coefs slots in the SeuratPlus object, I am also seeing some transcription factors written as pure numbers without any names. Please see my code below and the attached screenshots. I don't know what's wrong, and I also don't know how to explain these problems clearly. However, I would appreciate any support and am willing to provide additional information.

`pfm <- getMatrixSet(
x = JASPAR2022,
opts = list(collection = "CORE", tax_group = 'vertebrates', all_versions = FALSE))

x <- character()
for(i in 1:length(pfm@listData)){
x[i] <- pfm@listData[[i]]@name
}

motif_2_tf <- data.frame(motif = names(pfm@listData), tf = x, origin = "JASPAR", gene_id = NA, name = NA, family = NA, symbol = NA, motif_tf = NA)
data("motif2tf")
motif2tf <- subset(motif2tf, motif %in% motif_2_tf$motif)

firstup <- function(x) {
substr(x, 1, 1) <- toupper(substr(x, 1, 1))
substr(x, 2, nchar(x)) <- tolower(substr(x, 2, nchar(x)))
x
}

motif2tf$tf <- firstup(motif2tf$tf)

TKO_int_cont_hep <- readRDS("~/TKO Multiome/Final Seurat Objects/TKO Control Hepatocytes Filtered and Integrated.rds")
TKO_int_cont_hep <- FindVariableFeatures(TKO_int_cont_hep, assay = "RNA", nfeatures = nrow(TKO_int_cont_hep@assays$RNA)) %>%
FindTopFeatures(assay = "Peaks", min.cutoff = 'q0') %>%
RunTFIDF(assay = "Peaks")

TKO_int_cont_hep <- initiate_grn(TKO_int_cont_hep, peak_assay = "Peaks", rna_assay = "RNA", exclude_exons = FALSE) %>%
find_motifs(pfm = pfm, motif_tfs = motif2tf, genome = BSgenome.Mmusculus.UCSC.mm10)
x <- TKO_int_cont_hep@grn@regions@motifs
x <- x[,unique(motif2tf$motif)]
TKO_int_cont_hep@grn@regions@motifs <- x
TKO_int_cont_hep <- infer_grn(TKO_int_cont_hep, peak_to_gene_method = "GREAT", upstream = 1e+06, downstream = 1e+06, method = "xgb", tf_cor = 0, scale = TRUE, interaction_term = "*") %>%
find_modules(p_thresh = 1, nvar_thresh = 1, min_genes_per_module = 1, rsq_thresh = 0) %>%
get_network_graph(graph_name = "full_graph", umap_method = "none") %>%
get_tf_network(graph = "full_graph", tf = "Tcf7l2")`

Sandman-1 · 2023-01-31T18:59:26Z

Some details regarding my code include adapting the provided motif2tf data frame in Pando to decapitalize the letters of each transcription factor name after the first letter to make it compatible with BSgenome.Mmusculus.UCSC.mm10 and then subsetting the JASPAR2022 pfm object to only include transcription factors that are contained within the updated motif2tf data frame. I can edit my own motif2tf data frame object to keep transcription factors in singular fashion in the future so that I don't have to subset the pfm object in order to avoid the downstream colMaxs error, but I just wanted to keep things simple for this first iteration. Also, that last part of my code (tf network for Tcf7l2) does not even work because it's not listed as a transcription factor in the network graph (as shown in the screenshot). All this is really odd because my project revolves around showing the importance of this transcription factor in driving a lot of zonated hepatocyte activity.

joschif · 2023-02-01T16:03:39Z

Hi @Sandman-1,
unfortunately I'm also not fully sure what the cause of your issues is. I'm aware that there are a number of possible pitfalls when adapting the builtin resources to other genomes. Also I would recommend constraining the input peaks and features somehow as this results in much better results in my experience.
Cheers, J

sylestiel · 2023-02-01T19:00:30Z

@Sandman-1

Curious as to why the targets do not have gene names and have nomenclature like C6.... etc? I encountered the same problem.

joschif · 2023-02-01T19:40:15Z

Yeah that's quite odd, I haven't seen that before... could you maybe give me access to some example dataset where that happens? Could imagine that there is a problem with the motif to tf matching but not sure.

sylestiel · 2023-02-01T20:26:53Z

@joschif
Can you help with a script that conforms to Homer or Jaspar or Cisbp motif datasets?
chromVARmotifs::homer_pwms,chromVARmotifs::mouse_pwms_v2,chromVARmotifs::encode_pwms

joschif · 2023-02-01T20:41:48Z

I can try. Here's some code I used to convert JASPAR motifs to what is currently used by Pando, maybe its of use to you:

opts[['species']] <- 9606
opts[['all_versions']] <- T
collections <- c('CORE', 'CNE', 'PHYLOFACTS', 'SPLICE', 'POLII', 'FAM', 'PBM', 'PBM_HOMEO', 'PBM_HLH', 'UNVALIDATED')
names(collections) <- collections

motif_collection_list <- map(collections, function(c){
    opts[['collection']] <- c
    PFMatrixList <- getMatrixSet(JASPAR2020, opts)
    motif2tf <- PFMatrixList@listData %>% 
        map_dfr(function(x){
            tibble(
                name = x@name,
                collection = c,
                tf = unlist(str_split(str_replace(x@name, '(.+)\\(.+\\)', '\\1'), '::')),
                strand = x@strand,
                symbol = ifelse(is.null(x@tags$symbol), '-', x@tags$symbol),
                family = ifelse(is.null(x@tags$family), '-', x@tags$family)
            )
        }, .id='motif')
    return(list(pfm=PFMatrixList, df=motif2tf))
})

motifs <- map(motif_collection_list, function(x) x$pfm) %>% 
    purrr::reduce(c)

motif2tf <- map_dfr(motif_collection_list, function(x) x$df, .id='collection')

sylestiel · 2023-02-01T21:52:40Z

@joschif

When running infer_grn() what can I enter for gene category because the target is
still not showing a gene name when I run coef(muo_data).

joschif · 2023-02-02T15:31:06Z

target should be the right column, if there are no correct gene names in there then I would assume that something went wrong prior to running infer_grn(). I would need an example dataset that reproduces the problem to really figure out what's wrong though.

sylestiel · 2023-02-02T16:06:09Z

@joschif
How can I share it? Can I upload rds file here?

changwenwang09 · 2023-02-02T16:37:46Z

I ran into the same issue with analyzing the mouse data and got very few TFs (40-50) with the same target gene named "F3". The analysis was pretty much the same as what @sylestiel did. and I prepared the motifs and motif2tf data as @joschif suggested above by changing the Taxonomy ID to 10090. Also tried the motifs from cisBP and got the same error.
I am not sure whether this has anything to do with the inbuilt data sets such as the "EnsDb.Hsapiens.v93.annot.UCSC.hg38.RData" as I once got an error saying that genome(x) doesn't match genome(y) with one mm10 and one GRh38. But the error was somehow resolved after restarting R session. Any suggestions? Thanks a lot!

changwenwang09 · 2023-02-03T09:02:40Z

I rerun the infer_grn with
sobj_metacell <- infer_grn(sobj_metacell, peak_to_gene_method = 'GREAT', verbose = 2)
and got the following error message

Selecting candidate regulatory regions near genes
Preparing model input
Fitting models for 3783 target genes
...
Warning: Sorcs1 not found in EnsDb
Warning: Add3 not found in EnsDb
Warning: Rbm20 not found in EnsDb
Warning: Pdcd4 not found in EnsDb
...

only target genes like "F3", "C2", etc were successfully run.

I was not able to find the source code for infer_grn, but the "find_peaks_near_genes" function has the same log message
log_message('Selecting candidate regulatory regions near genes', verbose=verbose)

in the source code of "find_peaks_near_genes" function, there the inbuilt data "EnsDb.Hsapiens.v93.annot.UCSC.hg38" was used

 else if (method == "GREAT") {
        utils::data(EnsDb.Hsapiens.v93.annot.UCSC.hg38, envir = environment())
        gene_annot_use <- EnsDb.Hsapiens.v93.annot.UCSC.hg38[which(EnsDb.Hsapiens.v93.annot.UCSC.hg38$gene_name %in% 
            genes$gene_name), ]

I am not sure whether this is also used in the infer_grn function. But as the genes with only one letter were successfully run, and those with more than one letter failed, (eg, F3 and F3 are the same in mouse and human, while Sorcs1 and SORCS1 are different in mouse and human), do you think this might be the problem? How would you suggest fixing it?

Thanks a lot!

joschif · 2023-02-03T09:13:43Z

Ah thanks, I think that might be the source of the problem. Have you tried running with peak_to_gene_method='Signac'? The option 'GREAT' currently loads a custom version of the human EnsDB which makes it incompatible with other genomes. You can try to modify the function so that it accepts any ensdb as an argument, otherwise I would recommend just using the "Signac" method, which automatically uses the gene annotations provided in your Seurat object.

changwenwang09 · 2023-02-03T09:40:39Z

@joschif Thank you for the reply!
I used the "Signac" method, but always got the following error

Selecting candidate regulatory regions near genes
Preparing model input
Fitting models for 3783 target genes
Error in str2lang(x) : :1:8: unexpected symbol
1: 1110015O18Rik
^

Do you have any suggestions on what might be the problem?

joschif · 2023-02-03T09:45:02Z

Yeah that's a known problem. The lm fromulas don't work with feature names starting with numbers. I haven't gotten around to fixing it, but I would suggest to filter these genes from the list of input features/TFs as they should be the minority and usually are not protein-coding genes anyway

changwenwang09 · 2023-02-03T12:13:04Z

Thanks a lot! After removing the genes starting with numbers, the code was successfully run and I got a lot of TFs and targeting genes!

I would still like to try the "GREAT" method. I created the mouse data as below

edb_m <- EnsDb.Mmusculus.v79
EnsDb.Mmusculus.v79.mm10 <- genes(edb_m, 
                                  columns = c(listColumns(edb_m, "tx"), 
                                              "gene_name", "gene_biotype"))
seqlevelsStyle(EnsDb.Mmusculus.v79.mm10) <- "UCSC"

and then changed the names of the elementmetadata as what you have in the human data.

did you also get the "EnsDb.Hsapiens.v93.annot.UCSC.hg38" data this way? If not, could you let me know how you did it?
Thanks a lot!

sylestiel · 2023-02-03T16:50:54Z

@joschif @Sandman-1

How can I remove the Rik genes?

Sandman-1 · 2023-02-03T22:38:51Z

Wow, I missed a lot! Trying GRN Analysis using the Signac method then. Eager to see new results! :)))

sylestiel · 2023-02-05T18:03:33Z

@Sandman-1 @changwenwang09

Have either of you gotten it to work yet? If so, I could use some help here.

Thanks!

changwenwang09 · 2023-02-06T16:31:56Z

@sylestiel
mine was done with this

params <- Params(sobj)
genes <- VariableFeatures(sobj, assay = params$rna_assay)
genes <- genes[!grepl("Rik", genes)]

hope it also works with your data :)

sylestiel · 2023-02-06T20:27:27Z

@changwenwang09

That was very helpful. It worked! Thank you very much!!!!

sylestiel · 2023-02-07T20:16:14Z

@changwenwang09 @Sandman-1

Did either of you get GREAT to work with mouse data?

Usamahussein551980 · 2023-03-09T00:28:25Z

Hello everybody,
Thank you so much for your helpful information with your codes.
@Sandman-1 @sylestiel, I am not very well in R, so Could you please provide me with the data('phastConsElements20Mammals.UCSC.mm10')? The Pando tool provided for the human, and when I tried to use:
regions = StringToGRanges(Links(muo_data[['peaks']])$peak); it gave me a few linkPleaks:

I am using multiome data generated from mouse, so I tried a lot to solve this problem, but no success, so It would be helpful if you could provide me the mouse genome region to enable inferring the GRN.

I checked this website: http://hgdownload.cse.ucsc.edu/goldenpath/mm10/phastCons60way/
but failed to download through the linux command.

Thanks for your helpful with the forementioned codes in your comments.

sylestiel · 2023-03-09T15:45:43Z

Hi @Usamahussein551980

I had the exact same issues as you. Finally, I ended up with the following approach.

library(EnsDb.Mmusculus.v79)

DefaultAssay(muo_data) <- "ATAC"

extract gene annotations from EnsDb

annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)

change to UCSC style since the data was mapped to mm10

seqlevelsStyle(annotations) <- 'UCSC'

add the gene information to the object

Annotation(muo_data) <- annotations

Hope it helps!

Usamahussein551980 · 2023-03-10T22:12:44Z

Hi @sylestiel
Thank you very much for your information you provided, it helped me a lot.
I successfully got there, so I got a lot of TFs.

I am still curious to use 'GREAT' for peak_to_gene_metod.

simpakkk mentioned this issue Jul 29, 2024

Gene dropout from modules and errors for alternative infer_grn methods #66

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pando Not Returning Expected GRNs and Limiting Target Genes to a Handful - Possibility of Problem with Mouse Genome? #30

Pando Not Returning Expected GRNs and Limiting Target Genes to a Handful - Possibility of Problem with Mouse Genome? #30

Sandman-1 commented Jan 31, 2023

Sandman-1 commented Jan 31, 2023

joschif commented Feb 1, 2023

sylestiel commented Feb 1, 2023

joschif commented Feb 1, 2023

sylestiel commented Feb 1, 2023

joschif commented Feb 1, 2023

sylestiel commented Feb 1, 2023

joschif commented Feb 2, 2023

sylestiel commented Feb 2, 2023

changwenwang09 commented Feb 2, 2023

changwenwang09 commented Feb 3, 2023

joschif commented Feb 3, 2023 •

edited

Loading

changwenwang09 commented Feb 3, 2023

joschif commented Feb 3, 2023 •

edited

Loading

changwenwang09 commented Feb 3, 2023

sylestiel commented Feb 3, 2023

Sandman-1 commented Feb 3, 2023

sylestiel commented Feb 5, 2023

changwenwang09 commented Feb 6, 2023

sylestiel commented Feb 6, 2023

sylestiel commented Feb 7, 2023

Usamahussein551980 commented Mar 9, 2023

sylestiel commented Mar 9, 2023

Usamahussein551980 commented Mar 10, 2023

Pando Not Returning Expected GRNs and Limiting Target Genes to a Handful - Possibility of Problem with Mouse Genome? #30

Pando Not Returning Expected GRNs and Limiting Target Genes to a Handful - Possibility of Problem with Mouse Genome? #30

Comments

Sandman-1 commented Jan 31, 2023

Sandman-1 commented Jan 31, 2023

joschif commented Feb 1, 2023

sylestiel commented Feb 1, 2023

joschif commented Feb 1, 2023

sylestiel commented Feb 1, 2023

joschif commented Feb 1, 2023

sylestiel commented Feb 1, 2023

joschif commented Feb 2, 2023

sylestiel commented Feb 2, 2023

changwenwang09 commented Feb 2, 2023

changwenwang09 commented Feb 3, 2023

joschif commented Feb 3, 2023 • edited Loading

changwenwang09 commented Feb 3, 2023

joschif commented Feb 3, 2023 • edited Loading

changwenwang09 commented Feb 3, 2023

sylestiel commented Feb 3, 2023

Sandman-1 commented Feb 3, 2023

sylestiel commented Feb 5, 2023

changwenwang09 commented Feb 6, 2023

sylestiel commented Feb 6, 2023

sylestiel commented Feb 7, 2023

Usamahussein551980 commented Mar 9, 2023

sylestiel commented Mar 9, 2023

extract gene annotations from EnsDb

change to UCSC style since the data was mapped to mm10

add the gene information to the object

Usamahussein551980 commented Mar 10, 2023

joschif commented Feb 3, 2023 •

edited

Loading

joschif commented Feb 3, 2023 •

edited

Loading