forked from satijalab/seurat
-
Notifications
You must be signed in to change notification settings - Fork 0
/
seurat5_spatial_vignette.Rmd
524 lines (388 loc) · 29.8 KB
/
seurat5_spatial_vignette.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
---
title: "Analysis, visualization, and integration of spatial datasets with Seurat"
output:
html_document:
theme: united
df_print: kable
pdf_document: default
date: 'Compiled: `r Sys.Date()`'
---
```{r setup, include=FALSE}
all_times <- list() # store the time for each chunk
knitr::knit_hooks$set(time_it = local({
now <- NULL
function(before, options) {
if (before) {
now <<- Sys.time()
} else {
res <- difftime(Sys.time(), now, units = "secs")
all_times[[options$label]] <<- res
}
}
}))
knitr::opts_chunk$set(
tidy = TRUE,
tidy.opts = list(width.cutoff = 95),
message = FALSE,
warning = FALSE,
fig.width = 10,
time_it = TRUE,
error = TRUE
)
```
# Overview
This tutorial demonstrates how to use Seurat (>=3.2) to analyze spatially-resolved RNA-seq data. While the analytical pipelines are similar to the Seurat workflow for [single-cell RNA-seq analysis](pbmc3k_tutorial.html), we introduce updated interaction and visualization tools, with a particular emphasis on the integration of spatial and molecular information. This tutorial will cover the following tasks, which we believe will be common for many spatial analyses:
* Normalization
* Dimensional reduction and clustering
* Detecting spatially-variable features
* Interactive visualization
* Integration with single-cell RNA-seq data
* Working with multiple slices
For our first vignette, we analyze a dataset generated with the [Visium technology](https://www.10xgenomics.com/spatial-transcriptomics/) from 10x Genomics. We will be extending Seurat to work with additional data types in the near-future, including [SLIDE-Seq](https://science.sciencemag.org/content/363/6434/1463), [STARmap](https://science.sciencemag.org/content/361/6400/eaat5691), and [MERFISH](https://science.sciencemag.org/content/362/6416/eaau5324).
First, we load Seurat and the other packages necessary for this vignette.
```{r install}
library(Seurat)
options(Seurat.object.assay.version = "v5")
library(SeuratData)
library(ggplot2)
library(patchwork)
library(dplyr)
```
```{r libraries.for.rmd, echo = FALSE}
library("htmltools")
library("vembedr")
```
# 10x Visium
## Dataset
Here, we will be using a recently released dataset of sagital mouse brain slices generated using the Visium v1 chemistry. There are two serial anterior sections, and two (matched) serial posterior sections.
You can download the data [here](https://support.10xgenomics.com/spatial-gene-expression/datasets), and load it into Seurat using the `Load10X_Spatial()` function. This reads in the output of the [spaceranger](https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/what-is-space-ranger) pipeline, and returns a Seurat object that contains both the spot-level expression data along with the associated image of the tissue slice. You can also use our [SeuratData package](https://github.com/satijalab/seurat-data) for easy data access, as demonstrated below. After installing the dataset, you can type `?stxBrain` to learn more.
```{r data.install, eval = FALSE}
InstallData("stxBrain")
```
```{r data}
brain <- LoadData('stxBrain', type = 'anterior1')
brain[['Spatial']] <- as(brain[['Spatial']], Class = 'Assay5')
```
<details>
<summary>**How is the spatial data stored within Seurat? **</summary>
The visium data from 10x consists of the following data types:
* A spot by gene expression matrix
* An image of the tissue slice (obtained from H&E staining during data acquisition)
* Scaling factors that relate the original high resolution image to the lower resolution image used here for visualization.
In the Seurat object, the spot by gene expression matrix is similar to a typical "RNA" `Assay` but contains spot level, not single-cell level data. The image itself is stored in a new `images` slot in the Seurat object. The `images` slot also stores the information necessary to associate spots with their physical position on the tissue image.
</details>
## Data preprocessing
The initial preprocessing steps that we perform on the spot by gene expression data are similar to a typical scRNA-seq experiment. We first need to normalize the data in order to account for variance in sequencing depth across data points. We note that the variance in molecular counts / spot can be substantial for spatial datasets, particularly if there are differences in cell density across the tissue. We see substantial heterogeneity here, which requires effective normalization.
```{r qc, fig.height=5}
plot1 <- VlnPlot(brain, features = 'nCount_Spatial', pt.size = 0.1) + NoLegend()
plot2 <- SpatialFeaturePlot(brain, features = 'nCount_Spatial') + theme(legend.position = "right")
wrap_plots(plot1, plot2)
```
These plots demonstrate that the variance in molecular counts across spots is not just technical in nature, but also is dependent on the tissue anatomy. For example, regions of the tissue that are depleted for neurons (such as the cortical white matter), reproducibly exhibit lower molecular counts. As a result, standard approaches (such as the `LogNormalize()` function), which force each data point to have the same underlying 'size' after normalization, can be problematic.
As an alternative, we recommend using sctransform (Hafemeister and Satija, Genome Biology 2019), which which builds regularized negative binomial models of gene expression in order to account for technical artifacts while preserving biological variance. For more details on sctransform, please see the paper [here](https://doi.org/10.1186/s13059-019-1874-1) and the Seurat vignette [here](sctransform_vignette.html). sctransform normalizes the data, detects high-variance features, and stores the data in the `SCT` assay.
```{r preprocess}
brain <- SCTransform(brain, assay = "Spatial", verbose = FALSE)
```
<details>
<summary>**How do results compare to log-normalization?**</summary>
To explore the differences in normalization methods, we examine how both the sctransform and log normalization results correlate with the number of UMIs. For this comparison, we first rerun sctransform to store values for all genes and run a log-normalization procedure via `NormalizeData()`.
```{r norm.test}
# rerun normalization to store sctransform residuals for all genes
brain <- SCTransform(brain, assay = "Spatial", return.only.var.genes = FALSE, verbose = FALSE)
# also run standard log normalization for comparison
brain <- NormalizeData(brain, verbose = FALSE, assay = "Spatial")
```
```{r norm.test2}
# Computes the correlation of the log normalized data and sctransform residuals with the number of UMIs
brain <- GroupCorrelation(brain, group.assay = "Spatial", assay = "Spatial", slot = "data", do.plot = FALSE)
brain <- GroupCorrelation(brain, group.assay = "Spatial", assay = "SCT", slot = "scale.data", do.plot = FALSE)
```
```{r norm.test3}
p1 <- GroupCorrelationPlot(brain, assay = "Spatial", cor = "nCount_Spatial_cor") + ggtitle("Log Normalization") + theme(plot.title = element_text(hjust = 0.5))
p2 <- GroupCorrelationPlot(brain, assay = "SCT", cor = "nCount_Spatial_cor") + ggtitle("SCTransform Normalization") + theme(plot.title = element_text(hjust = 0.5))
p1 + p2
```
For the boxplots above, we calculate the correlation of each feature (gene) with the number of UMIs (the `nCount_Spatial` variable here). We then place genes into groups based on their mean expression, and generate boxplots of these correlations. You can see that log-normalization fails to adequately normalize genes in the first three groups, suggesting that technical factors continue to influence normalized expression estimates for highly expressed genes. In contrast, sctransform normalization substantially mitigates this effect.
</details>
## Gene expression visualization
In Seurat, we have functionality to explore and interact with the inherently visual nature of spatial data. The `SpatialFeaturePlot()` function in Seurat extends `FeaturePlot()`, and can overlay molecular data on top of tissue histology. For example, in this data set of the mouse brain, the gene Hpca is a strong hippocampus marker and Ttr is a marker of the choroid plexus.
```{r featureplot}
SpatialFeaturePlot(brain, features = c("Hpca", "Ttr"))
```
```{r save.img, include=TRUE}
library(ggplot2)
plot <- SpatialFeaturePlot(brain, features = c("Ttr")) +
theme(legend.text = element_text(size = 0), legend.title = element_text(size = 20), legend.key.size = unit(1, "cm"))
jpeg(filename = "../output/images/spatial_vignette_ttr.jpg", height = 700, width = 1200, quality = 50)
print(plot)
dev.off()
```
The default parameters in Seurat emphasize the visualization of molecular data. However, you can also adjust the size of the spots (and their transparency) to improve the visualization of the histology image, by changing the following parameters:
* `pt.size.factor`- This will scale the size of the spots. Default is 1.6
* `alpha` - minimum and maximum transparency. Default is c(1, 1).
* Try setting to `alpha` c(0.1, 1), to downweight the transparency of points with lower expression
```{r fpe1}
p1 <- SpatialFeaturePlot(brain, features = "Ttr", pt.size.factor = 1)
p2 <- SpatialFeaturePlot(brain, features = "Ttr", alpha = c(0.1, 1))
p1 + p2
```
## Dimensionality reduction, clustering, and visualization
We can then proceed to run dimensionality reduction and clustering on the RNA expression data, using the same workflow as we use for scRNA-seq analysis.
```{r dim.cluster}
brain <- RunPCA(brain, assay = "SCT", verbose = FALSE)
brain <- FindNeighbors(brain, reduction = "pca", dims = 1:30)
brain <- FindClusters(brain, verbose = FALSE)
brain <- RunUMAP(brain, reduction = "pca", dims = 1:30)
```
We can then visualize the results of the clustering either in UMAP space (with `DimPlot()`) or overlaid on the image with `SpatialDimPlot()`.
```{r dim.plots,fig.height=5}
p1 <- DimPlot(brain, reduction = "umap", label = TRUE)
p2 <- SpatialDimPlot(brain, label = TRUE, label.size = 3)
p1 + p2
```
As there are many colors, it can be challenging to visualize which voxel belongs to which cluster. We have a few strategies to help with this. Setting the `label` parameter places a colored box at the median of each cluster (see the plot above).
You can also use the `cells.highlight` parameter to demarcate particular cells of interest on a `SpatialDimPlot()`. This can be very useful for distinguishing the spatial localization of individual clusters, as we show below:
```{r facetdim}
SpatialDimPlot(brain, cells.highlight = CellsByIdentities(object = brain,idents = c(2, 1, 4, 3, 5, 8)), facet.highlight = TRUE, ncol = 3)
```
## Interactive plotting
We have also built in a number of interactive plotting capabilities. Both `SpatialDimPlot()` and `SpatialFeaturePlot()` now have an `interactive` parameter, that when set to `TRUE`, will open up the Rstudio viewer pane with an interactive Shiny plot. The example below demonstrates an interactive `SpatialDimPlot()` in which you can hover over spots and view the cell name and current identity class (analogous to the previous `do.hover` behavior).
```{r ispatialdimplot, eval = FALSE}
SpatialDimPlot(brain, interactive = TRUE)
```
```{r, echo = FALSE}
embed_url("https://youtu.be/E1aZjmG1neQ")
```
For `SpatialFeaturePlot()`, setting interactive to `TRUE` brings up an interactive pane in which you can adjust the transparency of the spots, the point size, as well as the `Assay` and feature being plotted. After exploring the data, selecting the done button will return the last active plot as a ggplot object.
```{r ispatialfeatureplot, eval = FALSE}
SpatialFeaturePlot(brain, features = "Ttr", interactive = TRUE)
```
```{r, echo = FALSE}
embed_url("https://youtu.be/ILmb8XNlgEM")
```
The `LinkedDimPlot()` function links the UMAP representation to the tissue image representation and allows for interactive selection. For example, you can select a region in the UMAP plot and the corresponding spots in the image representation will be highlighted.
```{r linkedplot, eval=FALSE}
LinkedDimPlot(brain)
```
```{r, echo = FALSE}
embed_url("https://youtu.be/10PZqjcSKrg")
```
## Identification of Spatially Variable Features
Seurat offers two workflows to identify molecular features that correlate with spatial location within a tissue. The first is to perform differential expression based on pre-annotated anatomical regions within the tissue, which may be determined either from unsupervised clustering or prior knowledge. This strategy works will in this case, as the clusters above exhibit clear spatial restriction.
```{r de, fig.height = 4}
de_markers <- FindMarkers(brain, ident.1 = 5, ident.2 = 6)
SpatialFeaturePlot(object = brain, features = rownames(de_markers)[1:3], alpha = c(0.1, 1), ncol = 3)
```
An alternative approach, implemented in `FindSpatiallyVariables()`, is to search for features exhibiting spatial patterning in the absence of pre-annotation. The default method (`method = 'markvariogram`), is inspired by the [Trendsceek](https://www.nature.com/articles/nmeth.4634), which models spatial transcriptomics data as a mark point process and computes a 'variogram', which identifies genes whose expression level is dependent on their spatial location. More specifically, this process calculates gamma(r) values measuring the dependence between two spots a certain "r" distance apart. By default, we use an r-value of '5' in these analyses, and only compute these values for variable genes (where variation is calculated independently of spatial location) to save time.
We note that there are multiple methods in the literature to accomplish this task, including [SpatialDE](https://www.nature.com/articles/nmeth.4636), and [Splotch](https://www.biorxiv.org/content/10.1101/757096v1.article-metrics). We encourage interested users to explore these methods, and hope to add support for them in the near future.
```{r spatial.vf}
brain <- FindSpatiallyVariableFeatures(brain, assay = 'SCT', features = VariableFeatures(brain)[1:1000], selection.method = 'moransi')
```
Now we visualize the expression of the top 6 features identified by this measure.
```{r spatial.vf.plot, fig.height=8}
top.features <- head(SpatiallyVariableFeatures(brain, selection.method = 'moransi'),6)
SpatialFeaturePlot(brain, features = top.features, ncol = 3, alpha = c(0.1, 1))
```
## Subset out anatomical regions
As with single-cell objects, you can subset the object to focus on a subset of data. Here, we approximately subset the frontal cortex. This process also facilitates the integration of these data with a cortical scRNA-seq dataset in the next section. First, we take a subset of clusters, and then further segment based on exact positions. After subsetting, we can visualize the cortical cells either on the full image, or a cropped image.
```{r subset1}
cortex <- subset(brain, idents = c(1, 2, 3, 4, 6, 7))
# now remove additional cells, use SpatialDimPlots to visualize what to remove
# SpatialDimPlot(cortex,cells.highlight = WhichCells(cortex, expression = image_imagerow > 400 | image_imagecol < 150))
cortex <- subset(cortex, anterior1_imagerow > 400 | anterior1_imagecol < 150, invert = TRUE)
cortex <- subset(cortex, anterior1_imagerow > 275 & anterior1_imagecol > 370, invert = TRUE)
cortex <- subset(cortex, anterior1_imagerow > 250 & anterior1_imagecol > 440, invert = TRUE)
```
```{r subset1.plot, fig.height = 4}
p1 <- SpatialDimPlot(cortex, crop = TRUE, label = TRUE)
p2 <- SpatialDimPlot(cortex, crop = FALSE, label = TRUE, pt.size.factor = 1, label.size = 3)
p1 + p2
```
## Integration with single-cell data
At ~50um, spots from the visium assay will encompass the expression profiles of multiple cells. For the growing list of systems where scRNA-seq data is available, users may be interested to 'deconvolute' each of the spatial voxels to predict the underlying composition of cell types. In preparing this vignette, we tested a wide variety of decovonlution and integration methods, using a [reference scRNA-seq dataset](https://www.nature.com/articles/nn.4216) of ~14,000 adult mouse cortical cell taxonomy from the Allen Institute, generated with the SMART-Seq2 protocol.
We consistently found superior performance using integration methods (as opposed to deconvolution methods), likely because of substantially different noise models that characterize spatial and single-cell datasets, and integration methods are specifiically designed to be robust to these differences. We therefore apply the 'anchor'-based integration workflow introduced in [Seurat v3](https://www.cell.com/cell/fulltext/S0092-8674(19)30559-8), that enables the probabilistic transfer of annotations from a reference to a query set. We therefore follow the label transfer workflow introduced [here](reference_mapping.html), taking advantage of sctransform normalization, but anticipate new methods to be developed to accomplish this task.
We first load the data (download available [here](https://www.dropbox.com/s/cuowvm4vrf65pvq/allen_cortex.rds?dl=1)), pre-process the scRNA-seq reference, and then perform label transfer. The procedure outputs, for each spot, a probabilistic classification for each of the scRNA-seq derived classes. We add these predictions as a new assay in the Seurat object.
```{r sc.data}
allen_reference <- readRDS("../data/allen_cortex.rds")
allen_reference <- UpdateSeuratObject(allen_reference)
```
```{r sc.data2}
# note that setting ncells=3000 normalizes the full dataset but learns noise models on 3k cells
# this speeds up SCTransform dramatically with no loss in performance
library(dplyr)
allen_reference <- SCTransform(allen_reference, ncells = 3000, verbose = FALSE) %>% RunPCA(verbose = FALSE) %>% RunUMAP(dims = 1:30)
```
```{r sc.data3, fig.width=8, fig.align="center"}
# After subsetting, we renormalize cortex
cortex <- SCTransform(cortex, assay = 'Spatial', verbose = FALSE) %>% RunPCA(verbose = FALSE)
# the annotation is stored in the 'subclass' column of object metadata
DimPlot(allen_reference, group.by = 'subclass', label = TRUE)
```
```{r sc.data5}
anchors <- FindTransferAnchors(reference = allen_reference, query = cortex, normalization.method = "SCT")
predictions.assay <- TransferData(anchorset = anchors, refdata = allen_reference$subclass, prediction.assay = TRUE, weight.reduction = cortex[["pca"]], dims = 1:30)
cortex[["predictions"]] <- predictions.assay
```
Now we get prediction scores for each spot for each class. Of particular interest in the frontal cortex region are the laminar excitatory neurons. Here we can distinguish between distinct sequential layers of these neuronal subtypes, for example:
```{r sc.data7}
DefaultAssay(cortex) <- "predictions"
SpatialFeaturePlot(cortex, features = c("L2/3 IT", "L4"), pt.size.factor = 1.6, ncol = 2, crop = TRUE)
```
Based on these prediction scores, we can also predict *cell types* whose location is spatially restricted. We use the same methods based on marked point processes to define spatially variable features, but use the cell type prediction scores as the "marks" rather than gene expression.
```{r sc.data8, fig.height = 10}
cortex <- FindSpatiallyVariableFeatures(cortex, assay = "predictions", selection.method = "moransi", features = rownames(cortex), r.metric = 5, slot = "data")
top.clusters <- head(SpatiallyVariableFeatures(cortex, selection.method = "moransi"), 4)
SpatialPlot(object = cortex, features = top.clusters, ncol = 2)
```
Finally, we show that our integrative procedure is capable of recovering the known spatial localization patterns of both neuronal and non-neuronal subsets, including laminar excitatory, layer-1 astrocytes, and the cortical grey matter.
```{r sc.data9,fig.height=20,fig.width=10}
SpatialFeaturePlot(cortex, features = c("Astro", "L2/3 IT", "L4", "L5 PT", "L5 IT", "L6 CT", "L6 IT", "L6b", "Oligo"), pt.size.factor = 1, ncol = 2, crop = FALSE, alpha = c(0.1, 1))
```
## Working with multiple slices in Seurat
This dataset of the mouse brain contains another slice corresponding to the other half of the brain. Here we read it in and perform the same initial normalization.
```{r brain2data}
brain2 <- LoadData('stxBrain', type = 'posterior1')
brain2[['Spatial']] <- as(brain2[['Spatial']], Class = 'Assay5')
brain2 <- SCTransform(brain2, assay = 'Spatial', verbose = FALSE)
```
In order to work with multiple slices in the same Seurat object, we provide the `merge` function.
```{r merge}
brain.merge <- merge(brain, brain2)
```
This then enables joint dimensional reduction and clustering on the underlying RNA expression data.
```{r joint.analysis}
DefaultAssay(brain.merge) <- "SCT"
VariableFeatures(brain.merge) <- c(VariableFeatures(brain), VariableFeatures(brain2))
brain.merge <- RunPCA(brain.merge, verbose = FALSE)
brain.merge <- FindNeighbors(brain.merge, dims = 1:30)
brain.merge <- FindClusters(brain.merge, verbose = FALSE)
brain.merge <- RunUMAP(brain.merge, dims = 1:30)
```
Finally, the data can be jointly visualized in a single UMAP plot. `SpatialDimPlot()` and `SpatialFeaturePlot()` will by default plot all slices as columns and groupings/features as rows.
```{r joint.viz, fig.height = 4}
DimPlot(brain.merge, reduction = "umap", group.by = c("ident", "orig.ident"))
```
```{r joint.viz2}
SpatialDimPlot(brain.merge)
```
```{r joint.viz3, fig.height = 10}
SpatialFeaturePlot(brain.merge, features = c('Hpca', 'Plp1'))
```
## Acknowledgments
We would like to thank Nigel Delaney and Stephen Williams for their helpful feedback and contributions to the new spatial Seurat code.
# Slide-seq
## Dataset
Here, we will be analyzing a dataset generated using [Slide-seq v2](https://www.biorxiv.org/content/10.1101/2020.03.12.989806v1) of the mouse hippocampus. This tutorial will follow much of the same structure as the spatial vignette for 10x Visium data but is tailored to give a demonstration specific to Slide-seq data.
You can use our [SeuratData package](https://github.com/satijalab/seurat-data) for easy data access, as demonstrated below. After installing the dataset, you can type `?ssHippo` to see the commands used to create the Seurat object.
```{r data.ss.install, eval = FALSE}
InstallData("ssHippo")
```
```{r data.ss}
slide.seq <- LoadData('ssHippo')
```
## Data preprocessing
The initial preprocessing steps for the bead by gene expression data are similar to other spatial Seurat analyses and to typical scRNA-seq experiments. Here, we note that many beads contain particularly low UMI counts but choose to keep all detected beads for downstream analysis.
```{r qc.ss, fig.height=5}
plot1 <- VlnPlot(slide.seq, features = 'nCount_Spatial', pt.size = 0, log = TRUE) + NoLegend()
slide.seq$log_nCount_Spatial <- log(slide.seq$nCount_Spatial)
plot2 <- SpatialFeaturePlot(slide.seq, features = 'log_nCount_Spatial') + theme(legend.position = "right")
wrap_plots(plot1, plot2)
```
We then normalize the data using [sctransform](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1874-1) and perform a standard scRNA-seq dimensionality reduction and clustering workflow.
```{r preprocess.ss}
slide.seq <- SCTransform(slide.seq, assay = "Spatial", ncells = 3000, n_genes=nrow(slide.seq[["Spatial"]]), verbose = FALSE)
slide.seq <- RunPCA(slide.seq, assay = "SCT")
slide.seq <- RunUMAP(slide.seq, dims = 1:30)
slide.seq <- FindNeighbors(slide.seq, dims = 1:30)
slide.seq <- FindClusters(slide.seq, resolution = 0.3, verbose = FALSE)
```
We can then visualize the results of the clustering either in UMAP space (with `DimPlot()`) or in the bead coordinate space with `SpatialDimPlot()`.
```{r dim.plots.ss,fig.height=5}
plot1 <- DimPlot(slide.seq, reduction = "umap", label = TRUE)
plot2 <- SpatialDimPlot(slide.seq, stroke = 0)
plot1 + plot2
SpatialDimPlot(slide.seq, cells.highlight = CellsByIdentities(object = slide.seq, idents = c(1, 6, 13)), facet.highlight = TRUE)
```
## Integration with a scRNA-seq reference
To facilitate cell-type annotation of the Slide-seq dataset, we are leveraging an existing mouse single-cell RNA-seq hippocampus dataset, produced in [Saunders\*, Macosko\*, et al. 2018](https://doi.org/10.1016/j.cell.2018.07.028). The data is available for download as a processed Seurat object [here](https://www.dropbox.com/s/cs6pii5my4p3ke3/mouse_hippocampus_reference.rds?dl=0), with the raw count matrices available on the [DropViz website](http://dropviz.org/).
```{r ref.saunders}
ref <- readRDS("../data/mouse_hippocampus_reference.rds")
ref <- UpdateSeuratObject(ref)
```
The original annotations from the paper are provided in the cell metadata of the Seurat object. These annotations are provided at several "resolutions", from broad classes (`ref$class`) to subclusters within celltypes (`ref$subcluster`). For the purposes of this vignette, we'll work off of a modification of the celltype annotations (`ref$celltype`) which we felt struck a good balance.
We'll start by running the Seurat label transfer method to predict the major celltype for each bead.
```{r ref.preprocessing.ss}
anchors <- FindTransferAnchors(reference = ref, query = slide.seq, normalization.method = "SCT", npcs = 50)
predictions.assay <- TransferData(anchorset = anchors, refdata = ref$celltype, prediction.assay = TRUE, weight.reduction = slide.seq[["pca"]], dims = 1:50)
slide.seq[["predictions"]] <- predictions.assay
```
We can then visualize the prediction scores for some of the major expected classes.
```{r transfer.viz.ss, fig.height = 8}
DefaultAssay(slide.seq) <- 'predictions'
SpatialFeaturePlot(slide.seq, features = c("Dentate Principal cells", "CA3 Principal cells", "Entorhinal cortex", "Endothelial tip", "Ependymal", "Oligodendrocyte"), alpha = c(0.1, 1))
```
```{r max.idents.ss}
slide.seq$predicted.id <- GetTransferPredictions(slide.seq)
Idents(slide.seq) <- "predicted.id"
SpatialDimPlot(slide.seq, cells.highlight = CellsByIdentities(object = slide.seq, idents = c("CA3 Principal cells", "Dentate Principal cells", "Endothelial tip")), facet.highlight = TRUE)
```
## Identification of Spatially Variable Features
As mentioned in the Visium vignette, we can identify spatially variable features in two general ways: differential expression testing between pre-annotated anatomical regions or statistics that measure the dependence of a feature on spatial location.
Here, we demonstrate the latter with an implementation of Moran's I available via `FindSpatiallyVariableFeatures()` by setting `method = 'moransi'`. Moran's I computes an overall spatial autocorrelation and gives a statistic (similar to a correlation coefficient) that measures the dependence of a feature on spatial location. This allows us to rank features based on how spatially variable their expression is. In order to facilitate quick estimation of this statistic, we implemented a basic binning strategy that will draw a rectangular grid over Slide-seq puck and average the feature and location within each bin. The number of bins in the x and y direction are controlled by the `x.cuts` and `y.cuts` parameters respectively. Additionally, while not required, installing the optional `Rfast2` package(`install.packages('Rfast2')`), will significantly decrease the runtime via a more efficient implementation.
```{r spatial.vf.ss}
DefaultAssay(slide.seq) <- "SCT"
slide.seq <- FindSpatiallyVariableFeatures(slide.seq, assay = 'SCT', slot = "scale.data", features = VariableFeatures(slide.seq)[1:1000], selection.method = 'moransi', x.cuts = 100, y.cuts = 100)
```
Now we visualize the expression of the top 6 features identified by Moran's I.
```{r spatial.vf.plot.ss, fig.height = 8}
SpatialFeaturePlot(slide.seq, features = head(SpatiallyVariableFeatures(slide.seq, selection.method = "moransi"), 6), ncol = 3, alpha = c(0.1, 1), max.cutoff = "q95")
```
## Spatial deconvolution using RCTD
While `FindTransferAnchors` can be used to integrate spot-level data from spatial transcriptomic datasets, Seurat v5 also includes support for the [Robust Cell Type Decomposition](https://www.nature.com/articles/s41587-021-00830-w), a computational approach to deconvolve spot-level data from spatial datasets, when provided with an scRNA-seq reference. RCTD has been shown to accurately annotate spatial data from a variety of technologies, including SLIDE-seq, Visium, and the 10x Xenium in-situ spatial platform.
To run RCTD, we first install the `spacexr` package from GitHub which implements RCTD.
```{r, eval=FALSE}
devtools::install_github("dmcable/spacexr", build_vignettes = FALSE)
```
Counts, cluster, and spot information is extracted from the Seurat query and reference objects to construct `Reference` and `SpatialRNA` objects used by RCTD for annotation.
```{r rctd.setup, warning=FALSE, results=FALSE}
library(spacexr)
# set up reference
ref <- readRDS("../data/mouse_hippocampus_reference.rds")
ref <- UpdateSeuratObject(ref)
Idents(ref) <- "celltype"
# extract information to pass to the RCTD Reference function
counts <- ref[["RNA"]]$counts
cluster <- as.factor(ref$celltype)
names(cluster) <- colnames(ref)
nUMI <- ref$nCount_RNA
names(nUMI) <- colnames(ref)
reference <- Reference(counts, cluster, nUMI)
# set up query with the RCTD function SpatialRNA
slide.seq <- SeuratData::LoadData("ssHippo")
counts <- slide.seq[["Spatial"]]$counts
coords <- GetTissueCoordinates(slide.seq)
colnames(coords) <- c("x", "y")
coords[is.na(colnames(coords))] <- NULL
query <- SpatialRNA(coords, counts, colSums(counts))
```
Using the `reference` and `query` object, we annotate the dataset and add the cell type labels to the query Seurat object. RCTD parallelizes well, so multiple cores can be specified for faster performance.
```{r run.rctd, warning=FALSE, results=FALSE}
RCTD <- create.RCTD(query, reference, max_cores = 8)
RCTD <- run.RCTD(RCTD, doublet_mode = 'doublet')
slide.seq <- AddMetaData(slide.seq, metadata = RCTD@results$results_df)
```
Next, plot the RCTD annotations. Because we ran RCTD in doublet mode, the algorithm assigns a `first_type` and `second_type` for each barcode or spot.
```{r rctd_results, fig.height=8, fig.width=14}
p1 <- SpatialDimPlot(slide.seq, group.by = "first_type")
p2 <- SpatialDimPlot(slide.seq, group.by = "second_type")
p1 | p2
```
```{r save.times, include=TRUE}
write.csv(x = t(as.data.frame(all_times)), file = "../output/timings/seurat5_spatial_vignette_times.csv")
```
<details>
<summary>**Session Info**</summary>
```{r}
sessionInfo()
```
</details>