Usage | Release | Development |
---|---|---|
UCSCXenaTools is an R package for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. Public omics data from UCSC Xena are supported through multiple turn-key Xena Hubs, which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded.
Who is the target audience and what are scientific applications of this package?
- Target Audience: cancer and clinical researchers, bioinformaticians
- Applications: genomic and clinical analyses
Install stable release from CRAN with:
install.packages("UCSCXenaTools")
You can also install devel version of UCSCXenaTools from github with:
# install.packages("remotes")
remotes::install_github("ropensci/UCSCXenaTools")
If you want to build vignette in local, please add two options:
remotes::install_github("ropensci/UCSCXenaTools", build_vignettes = TRUE, dependencies = TRUE)
All datasets are available at https://xenabrowser.net/datapages/.
Currently, UCSCXenaTools supports the following data hubs of UCSC Xena.
- UCSC Public Hub: https://ucscpublic.xenahubs.net/
- TCGA Hub: https://tcga.xenahubs.net/
- GDC Xena Hub: https://gdc.xenahubs.net/
- ICGC Xena Hub: https://icgc.xenahubs.net/
- Pan-Cancer Atlas Hub: https://pancanatlas.xenahubs.net/
- UCSC Toil RNAseq Recompute Compendium Hub: https://toil.xenahubs.net/
- PCAWG Xena Hub: https://pcawg.xenahubs.net/
- ATAC-seq Hub: https://atacseq.xenahubs.net/
- Singel Cell Xena Hub: https://singlecellnew.xenahubs.net/
- Kids First Xena Hub: https://kidsfirst.xenahubs.net/
- Treehouse Xena Hub: https://xena.treehouse.gi.ucsc.edu:443/
Users can update dataset list from the newest version of UCSC Xena by
hand with XenaDataUpdate()
function, followed by restarting R and
library(UCSCXenaTools)
.
If any url of data hub is changed or a new data hub is online, please remind me by emailing to [email protected] or opening an issue on GitHub.
Download UCSC Xena datasets and load them into R by UCSCXenaTools is
a workflow with generate
, filter
, query
, download
and prepare
5 steps, which are implemented as XenaGenerate
, XenaFilter
,
XenaQuery
, XenaDownload
and XenaPrepare
functions, respectively.
They are very clear and easy to use and combine with other packages like
dplyr
.
To show the basic usage of UCSCXenaTools, we will download clinical
data of LUNG, LUAD, LUSC from TCGA (hg19 version) data hub. Users can
learn more about UCSCXenaTools by running
browseVignettes("UCSCXenaTools")
to read vignette.
UCSCXenaTools uses a data.frame
object (built in package)
XenaData
to generate an instance of XenaHub
class, which records
information of all datasets of UCSC Xena Data Hubs.
You can load XenaData
after loading UCSCXenaTools
into R.
library(UCSCXenaTools)
#> =========================================================================================
#> UCSCXenaTools version 1.4.2
#> Project URL: https://github.com/ropensci/UCSCXenaTools
#> Usages: https://cran.r-project.org/web/packages/UCSCXenaTools/vignettes/USCSXenaTools.html
#>
#> If you use it in published research, please cite:
#> Wang et al., (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data
#> from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq.
#> Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627
#> =========================================================================================
#> --Enjoy it--
data(XenaData)
head(XenaData)
#> # A tibble: 6 x 17
#> XenaHosts XenaHostNames XenaCohorts XenaDatasets SampleCount DataSubtype Label
#> <chr> <chr> <chr> <chr> <int> <chr> <chr>
#> 1 https://… publicHub Breast Can… ucsfNeve_pu… 51 gene expre… Neve…
#> 2 https://… publicHub Breast Can… ucsfNeve_pu… 57 phenotype Phen…
#> 3 https://… publicHub Glioma (Ko… kotliarov20… 194 copy number Kotl…
#> 4 https://… publicHub Glioma (Ko… kotliarov20… 194 phenotype Phen…
#> 5 https://… publicHub Lung Cance… weir2007_pu… 383 copy number CGH
#> 6 https://… publicHub Lung Cance… weir2007_pu… 383 phenotype Phen…
#> # … with 10 more variables: Type <chr>, AnatomicalOrigin <chr>,
#> # SampleType <chr>, Tags <chr>, ProbeMap <chr>, LongTitle <chr>,
#> # Citation <chr>, Version <chr>, Unit <chr>, Platform <chr>
Select datasets.
# The options in XenaFilter function support Regular Expression
XenaGenerate(subset = XenaHostNames=="tcgaHub") %>%
XenaFilter(filterDatasets = "clinical") %>%
XenaFilter(filterDatasets = "LUAD|LUSC|LUNG") -> df_todo
df_todo
#> class: XenaHub
#> hosts():
#> https://tcga.xenahubs.net
#> cohorts() (3 total):
#> TCGA Lung Cancer (LUNG)
#> TCGA Lung Adenocarcinoma (LUAD)
#> TCGA Lung Squamous Cell Carcinoma (LUSC)
#> datasets() (3 total):
#> TCGA.LUNG.sampleMap/LUNG_clinicalMatrix
#> TCGA.LUAD.sampleMap/LUAD_clinicalMatrix
#> TCGA.LUSC.sampleMap/LUSC_clinicalMatrix
Query and download.
XenaQuery(df_todo) %>%
XenaDownload() -> xe_download
For researchers in China, now Hiplot team
has deployed several Xena mirror sites at
Shanghai. You can set an option options(use_hiplot = TRUE)
before
querying data step to speed up both data querying and downloading.
options(use_hiplot = TRUE)
XenaQuery(df_todo) %>%
XenaDownload() -> xe_download
#> Use hiplot server (China) for mirrored data hubs (set 'options(use_hiplot = FALSE)' to disable it)
#> This will check url status, please be patient.
#> All downloaded files will under directory /var/folders/mx/rfkl27z90c96wbmn3_kjk8c80000gn/T//Rtmp4UYCMN.
#> The 'trans_slash' option is FALSE, keep same directory structure as Xena.
#> Creating directories for datasets...
#> Downloading TCGA.LUNG.sampleMap/LUNG_clinicalMatrix
#> Downloading TCGA.LUAD.sampleMap/LUAD_clinicalMatrix
#> Downloading TCGA.LUSC.sampleMap/LUSC_clinicalMatrix
Prepare data into R for analysis.
cli = XenaPrepare(xe_download)
class(cli)
#> [1] "list"
names(cli)
#> [1] "LUNG_clinicalMatrix" "LUAD_clinicalMatrix" "LUSC_clinicalMatrix"
- Introduction and basic usage of UCSCXenaTools
- UCSCXenaTools: Retrieve Gene Expression and Clinical Information from UCSC Xena for Survival Analysis
- Obtain RNAseq Values for a Specific Gene in Xena Database
- UCSC Xena Access APIs in UCSCXenaTools
Cite me by the following paper.
Wang et al., (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data
from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq.
Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627
# For BibTex
@article{Wang2019UCSCXenaTools,
journal = {Journal of Open Source Software},
doi = {10.21105/joss.01627},
issn = {2475-9066},
number = {40},
publisher = {The Open Journal},
title = {The UCSCXenaTools R package: a toolkit for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq},
url = {https://dx.doi.org/10.21105/joss.01627},
volume = {4},
author = {Wang, Shixiang and Liu, Xuesong},
pages = {1627},
date = {2019-08-05},
year = {2019},
month = {8},
day = {5},
}
Cite UCSC Xena by the following paper.
Goldman, Mary, et al. "The UCSC Xena Platform for cancer genomics data
visualization and interpretation." BioRxiv (2019): 326470.
For anyone who wants to contribute, please follow the guideline:
- Clone project from GitHub
- Open
UCSCXenaTools.Rproj
with RStudio - Modify source code
- Run
devtools::check()
, and fix all errors, warnings and notes - Create a pull request
This package is based on XenaR, thanks Martin Morgan for his work.