vignettes/seurat5_spatial_vignette_2.Rmd

---
title: "Analysis of Image-based Spatial Data in Seurat"
output:
  html_document:
    theme: united
    df_print: kable
date: 'Compiled: `r format(Sys.Date(), "%B %d, %Y")`'
---
***

```{r setup, include=FALSE}
all_times <- list()  # store the time for each chunk
knitr::knit_hooks$set(time_it = local({
  now <- NULL
  function(before, options) {
    if (before) {
      now <<- Sys.time()
    } else {
      res <- difftime(Sys.time(), now, units = "secs")
      all_times[[options$label]] <<- res
    }
  }
}))
knitr::opts_chunk$set(
  tidy = TRUE,
  tidy.opts = list(width.cutoff = 95),
  message = FALSE,
  warning = FALSE,
  time_it = TRUE,
  error = TRUE
)
```

# Overview

In this vignette, we introduce a Seurat extension to analyze new types of spatially-resolved data. We have [previously introduced a spatial framework](https://satijalab.org/seurat/articles/spatial_vignette.html) which is compatible with sequencing-based technologies, like the 10x Genomics Visium system, or SLIDE-seq. Here, we extend this framework to analyze new data types that are captured via highly multiplexed imaging. In contrast to sequencing-based technologies, these datasets are often targeted (i.e. they profile a pre-selected set of genes). However they can resolve individual molecules - retaining single-cell (and subcellular) resolution. These approaches also often capture cellular boundaries (segmentations).

We update the Seurat infrastructure to enable the analysis, visualization, and exploration of these exciting datasets. In this vignette, we focus on three datasets produced by different multiplexed imaging technologies, each of which is publicly available. We will be adding support for additional imaging-based technologies in the coming months.

* Vizgen MERSCOPE (Mouse Brain)
* Nanostring CosMx Spatial Molecular Imager (FFPE Human Lung)
* Akoya CODEX (Human Lymph Node)

First, we load the packages necessary for this vignette.

```{r init, message=FALSE, warning=FALSE}
library(Seurat)
library(future)
plan("multisession", workers = 10)
library(ggplot2)
```

# Mouse Brain: Vizgen MERSCOPE

This dataset was produced using the Vizgen MERSCOPE system, which utilizes the MERFISH technology. The total dataset is available for [public download](https://info.vizgen.com/mouse-brain-data), and contains nine samples (three full coronal slices of the mouse brain, with three biological replicates per slice). The gene panel consists of 483 gene targets, representing known anonical cell type markers, nonsensory G-Protein coupled receptors (GPCRs), and Receptor Tyrosine Kinases (RTKs). In this vignette, we analyze one of the samples - slice 2, replicate 1. The median number of transcripts detected in each cell is 206.

First, we read in the dataset and create a Seurat object. 

We use the `LoadVizgen()` function, which we have written to read in the output  of the Vizgen analysis pipeline. The resulting Seurat object contains the following information:

* A count matrix, indicating the number of observed molecules for each of the 483 transcripts in each cell. This matrix is analogous to a count matrix in scRNA-seq, and is stored by default in the RNA assay of the Seurat object

```{r, message=FALSE, warning=FALSE}
# Loading segmentations is a slow process and multi processing with the future pacakge is recommended
vizgen.obj <- LoadVizgen(data.dir = "/brahms/hartmana/vignette_data/vizgen/s2r1/", fov = "s2r1")
```

The next pieces of information are specific to imaging assays, and is stored in the images slot of the resulting Seurat object:

<details>
  <summary>**Cell Centroids: The spatial coordinates marking the centroid for each cell being profiled**</summary>

```{r}
# Get the center position of each centroid. There is one row per cell in this dataframe.
head(GetTissueCoordinates(vizgen.obj[["s2r1"]][["centroids"]]))
```
</details>
<details>
  <summary>**Cell Segmentation Boundaries: The spatial coordinates that describe the polygon segmentation of each single cell**</summary>

```{r}
# Get the coordinates for each segmentation vertice. Each cell will have a variable number of vertices describing its shape.
head(GetTissueCoordinates(vizgen.obj[["s2r1"]][["segmentation"]]))
```
</details>
<details>
  <summary>**Molecule positions: The spatial coordinates for each individual molecule that was detected during the multiplexed smFISH experiment.**</summary>

```{r}
# Fetch molecules positions for Chrm1
head(FetchData(vizgen.obj[["s2r1"]][["molecules"]], vars="Chrm1"))
```
</details>
\

## Preprocessing and unsupervised analysis
We start by performing a standard unsupervised clustering analysis, essentially first treating the dataset as an scRNA-seq experiment. We use SCTransform-based normalization, though we slightly modify the default clipping parameters to mitigate the effect of outliers that we occasionally observe in smFISH experiments. After normalization, we can run dimensional reduction and clustering.

```{r analysis, results='hide'}
vizgen.obj <- SCTransform(vizgen.obj, assay = "Vizgen", clip.range = c(-10,10))
vizgen.obj <- RunPCA(vizgen.obj, npcs = 30, features = rownames(vizgen.obj))
vizgen.obj <- RunUMAP(vizgen.obj, dims = 1:30)
vizgen.obj <- FindNeighbors(vizgen.obj, reduction = "pca", dims = 1:30)
vizgen.obj <- FindClusters(vizgen.obj, resolution = 0.3)
```

We can then visualize the results of the clustering either in UMAP space (with `DimPlot()`) or overlaid on the image with `ImageDimPlot()`.

```{r umap}
DimPlot(vizgen.obj, reduction = "umap")
```

```{r spatial.plot, fig.height=6, fig.width=6}
ImageDimPlot(vizgen.obj, fov = "s2r1", cols = "polychrome", axes = TRUE)
```

You can also customize multiple aspect of the plot, including the color scheme, cell border widths, and size (see below).

<details>
  <summary>**Customizing spatial plots in Seurat**</summary>

The `ImageDimPlot()` and `ImageFeaturePlot()` functions have a few parameters which you can customize individual visualizations. These include:

* alpha: Ranges from 0 to 1. Sets the transparency of within-cell coloring.
* size: determines the size of points representing cells, if centroids are being plotted
* cols: Sets the color scheme for the internal shading of each cell. Examples settings are `polychrome`, `glasbey`, `Paired`, `Set3`, and `parade`. Default is the ggplot2 color palette
* shuffle.cols: In some cases the selection of `cols` is more effective when the same colors are assigned to different clusters. Set `shuffle.cols = TRUE` to randomly shuffle the colors in the palette.
* border.size: Sets the width of the cell segmentation borders. By default, segmentations are plotted with a border size of 0.3 and centroids are plotted without border.
* border.color: Sets the color of the cell segmentation borders
* dark.background: Sets a black background color (TRUE by default)
* axes: Display
</details>

Since it can be difficult to visualize the spatial localization patterns of an individual cluster when viewing them all together, we can highlight all cells that belong to a particular cluster:

```{r, fig.height=8, fig.width=12}
p1 <- ImageDimPlot(vizgen.obj, fov = "s2r1", cols = "red", cells = WhichCells(vizgen.obj, idents = 1))
p2 <- ImageDimPlot(vizgen.obj, fov = "s2r1", cols = "red", cells = WhichCells(vizgen.obj, idents = 15))
p1 + p2
```

We can find markers of individual clusters and visualize their spatial expression pattern. We can color cells based on their quantified expression of an individual gene, using `ImageFeaturePlot()`, which is analagous to the `FeaturePlot()` function for visualizing expression on a 2D embedding. Since MERFISH images individual molecules, we can also visualize the location of individual *molecules*. 

```{r, fig.height=7, fig.width=12}
p1 <- ImageFeaturePlot(vizgen.obj, features = "Slc17a7")
p2 <- ImageDimPlot(vizgen.obj, molecules = "Slc17a7", nmols = 10000, alpha = 0.3, mols.cols = "red")
p1 + p2
```

Note that the `nmols` parameter can be used to reduce the total number of molecules shown to reduce overplotting. You can also use the `mols.size`, `mols.cols`, and `mols.alpha` parameter to further optimize.

Plotting molecules is especially useful for visualizing co-expression of multiple genes on the same plot.

```{r, fig.height=7, fig.width=12}
p1 <- ImageDimPlot(vizgen.obj, fov = "s2r1", alpha = 0.3, molecules = c("Slc17a7", "Olig1"), nmols = 10000)
markers.14 <- FindMarkers(vizgen.obj, ident.1 = "14")
p2 <- ImageDimPlot(vizgen.obj, fov = "s2r1", alpha = 0.3, molecules = rownames(markers.14)[1:4], nmols = 10000)
p1 + p2
```

The updated Seurat spatial framework has the option to treat cells as individual points, or also to visualize cell boundaries (segmentations). By default, Seurat ignores cell segmentations and treats each cell as a point ('centroids'). This speeds up plotting, especially when looking at large areas, where cell boundaries are too small to visualize.

We can zoom into a region of tissue, creating a new field of view. For example, we can zoom into a region that contains the hippocampus. Once zoomed-in, we can set `DefaultBoundary()` to show cell segmentations. You can also 'simplify' the cell segmentations, reducing the number of edges in each polygon to  speed up plotting.

```{r, fig.height=5, fig.width=14}
# create a Crop
cropped.coords <- Crop(vizgen.obj[["s2r1"]], x = c(1750, 3000), y = c(3750, 5250), coords = "plot")
# set a new field of view (fov)
vizgen.obj[["hippo"]] <- cropped.coords

# visualize FOV using default settings (no cell boundaries)
p1 <- ImageDimPlot(vizgen.obj, fov = "hippo", axes = TRUE, size = 0.7, border.color = "white", cols = "polychrome", coord.fixed = FALSE)

# visualize FOV with full cell segmentations
DefaultBoundary(vizgen.obj[["hippo"]]) <- "segmentation"
p2 <- ImageDimPlot(vizgen.obj, fov = "hippo", axes = TRUE, border.color = "white", border.size = 0.1, cols = "polychrome", coord.fixed = FALSE)

# simplify cell segmentations
vizgen.obj[["hippo"]][["simplified.segmentations"]] <- Simplify(coords = vizgen.obj[["hippo"]][["segmentation"]], tol = 3)
DefaultBoundary(vizgen.obj[["hippo"]]) <- "simplified.segmentations"

# visualize FOV with simplified cell segmentations
DefaultBoundary(vizgen.obj[["hippo"]]) <- "simplified.segmentations"
p3 <- ImageDimPlot(vizgen.obj, fov = "hippo", axes = TRUE, border.color = "white", border.size = 0.1, cols = "polychrome", coord.fixed = FALSE)

p1 + p2 + p3
```

<details>
  <summary>**What is the tol parameter?**</summary>

The tol parameter determines how simplified the resulting segmentations are. A higher value of tol will reduce the number of vertices more drastically which will speed up plotting, but some segmentation detail will be lost. See https://rgeos.r-forge.r-project.org/reference/topo-unary-gSimplify.html for examples using different values for tol.

</details>

We can visualize individual molecules plotted at higher resolution after zooming-in
```{r, fig.height=8, fig.width=8}
# Since there is nothing behind the segmentations, alpha will slightly mute colors
ImageDimPlot(vizgen.obj, fov = "hippo", molecules = rownames(markers.14)[1:4], cols = "polychrome", mols.size = 1, alpha = 0.5, mols.cols = c("red", "blue", "yellow", "green"))
```

# Mouse Brain: 10x Genomics Xenium In Situ

In this section we'll analyze data produced by the Xenium platform. The vignette demonstrates how to load the per-transcript location data, cell x gene matrix, cell segmentation, and cell centroid information available in the Xenium outputs. The resulting Seurat object will contain the gene expression profile of each cell, the centroid and boundary of each cell, and the location of each individual detected transcript. The per-cell gene expression profiles are similar to standard single-cell RNA-seq and can be analyzed using the same tools.

This uses the `Tiny subset` dataset from 10x Genomics provided in the [Fresh Frozen Mouse Brain for Xenium Explorer Demo](https://www.10xgenomics.com/resources/datasets/fresh-frozen-mouse-brain-for-xenium-explorer-demo-1-standard) which can be downloaded as described below. These analysis steps are also compatible with the larger `Full coronal section`, but will take longer to execute.

```{bash, eval=FALSE}
wget https://cf.10xgenomics.com/samples/xenium/1.0.2/Xenium_V1_FF_Mouse_Brain_Coronal_Subset_CTX_HP/Xenium_V1_FF_Mouse_Brain_Coronal_Subset_CTX_HP_outs.zip
unzip Xenium_V1_FF_Mouse_Brain_Coronal_Subset_CTX_HP_outs.zip
```

First we read in the dataset and create a Seurat object. Provide the path to the data folder for a Xenium run as the input path. The RNA data is stored in the `Xenium` assay of the Seurat object. This step should take about a minute.

```{r load.xenium, results='hide'}
path <- "/brahms/hartmana/vignette_data/xenium_tiny_subset"
# Load the Xenium data
xenium.obj <- LoadXenium(path, fov = "fov")
# remove cells with 0 counts
xenium.obj <- subset(xenium.obj, subset = nCount_Xenium > 0)
```

Spatial information is loaded into slots of the Seurat object, labelled by the name of "field of view" (FOV) being loaded. Initially all the data is loaded into the FOV named `fov`. Later, we will make a cropped FOV that zooms into a region of interest.

Standard QC plots provided by Seurat are available via the `Xenium` assay. Here are violin plots of genes per cell (`nFeature_Xenium`) and transcript counts per cell (`nCount_Xenium`)
```{r vlnplot.xenium}
VlnPlot(xenium.obj, features = c("nFeature_Xenium", "nCount_Xenium"), ncol = 2, pt.size = 0)
```

Next, we plot the positions of the pan-inhibitory neuron marker Gad1, inhibitory neuron sub-type markers Pvalb, and Sst, and astrocyte marker Gfap on the tissue using `ImageDimPlot()`.
```{r p2.xenium, fig.width=10, fig.height=8}
ImageDimPlot(xenium.obj, fov = "fov", molecules = c("Gad1", "Sst", "Pvalb", "Gfap"), nmols = 20000)
```

```{r save.img, include=FALSE}
plot <- ImageDimPlot(xenium.obj, fov = "fov", molecules = c("Gad1", "Gfap"), nmols = 40000, alpha=0.01, dark.background = F, mols.alpha = 0.6) + coord_flip() + scale_x_reverse() + NoLegend()
ggsave(filename = "../output/images/spatial_vignette_2.jpg", height = 5, width = 9, plot = plot)
```

Here we visualize the expression level of some key layer marker genes at the per-cell level using `ImageFeaturePlot()` which is analogous to the `FeaturePlot()` function for visualizing expression on a 2D embedding. We manually adjust the `max.cutoff` for each gene to roughly the 90th percentile (which can be specified with `max.cutoff='q90'`) of it's count distribution to improve contrast.
```{r mat.xenium, message=FALSE, warning=FALSE, fig.width=12, fig.height=12}
ImageFeaturePlot(xenium.obj, features = c("Cux2", "Rorb", "Bcl11b", "Foxp2"), max.cutoff = c(25, 35, 12, 10), size = 0.75, cols = c("white", "red"))
```

We can zoom in on a chosen area with the `Crop()` function. Once zoomed-in, we can visualize cell segmentation boundaries along with individual molecules.
```{r cropping.xenium, message=FALSE, warning=FALSE, fig.width=10, fig.height=8}
cropped.coords <- Crop(xenium.obj[["fov"]], x = c(1200, 2900), y = c(3750, 4550), coords = "plot")
xenium.obj[["zoom"]] <- cropped.coords
# visualize cropped area with cell segmentations & selected molecules
DefaultBoundary(xenium.obj[["zoom"]]) <- "segmentation"
ImageDimPlot(xenium.obj, fov = "zoom", 
             axes = TRUE, border.color = "white", border.size = 0.1,
             cols = "polychrome", coord.fixed = FALSE,
             molecules = c("Gad1", "Sst", "Npy2r", "Pvalb", "Nrn1"), nmols = 10000)
```

Next, we use SCTransform for normalization followed by standard dimensionality reduction and clustering. This step takes about 5 minutes from start to finish.
```{r unsupervised.xenium, results='hide'}
xenium.obj <- SCTransform(xenium.obj, assay = "Xenium")
xenium.obj <- RunPCA(xenium.obj, npcs = 30, features = rownames(xenium.obj))
xenium.obj <- RunUMAP(xenium.obj, dims = 1:30)
xenium.obj <- FindNeighbors(xenium.obj, reduction = "pca", dims = 1:30)
xenium.obj <- FindClusters(xenium.obj, resolution = 0.3)
```

We can then visualize the results of the clustering by coloring each cell according to its cluster either in UMAP space with `DimPlot()` or overlaid on the image with `ImageDimPlot()`.
```{r umap.xenium, fig.width=10, fig.height=7}
DimPlot(xenium.obj)
```

We can visualize the expression level of the markers we looked at earlier on the UMAP coordinates.
```{r features.xenium, fig.width=8, fig.height=10}
FeaturePlot(xenium.obj, features = c("Cux2", "Bcl11b", "Foxp2", "Gad1", "Sst", "Gfap"))
```

We can now use `ImageDimPlot()` to color the cell positions colored by the cluster labels determined in the previous step.
```{r clusters.xenium, fig.width=13, fig.height=13}
ImageDimPlot(xenium.obj, cols = "polychrome", size = 0.75)
```

Using the positional information of each cell, we compute spatial niches.
We use a cortex reference from the the Allen Brain Institute to annotate cells, so we first crop the dataset to the cortex. The Allen Brain reference can be installed [here](https://www.dropbox.com/s/cuowvm4vrf65pvq/allen_cortex.rds?dl=1).

Below, we use Slc17a7 expression to help determine the cortical region.

```{r, fig.width=5, fig.height=5, warning=FALSE}
xenium.obj <- LoadXenium("/brahms/hartmana/vignette_data/xenium_tiny_subset")
p1 <- ImageFeaturePlot(xenium.obj, features = "Slc17a7", axes = TRUE, max.cutoff = "q90")
p1
```

```{r resolve.crop, fig.width=5, fig.height=7, warning=FALSE}
crop <- Crop(xenium.obj[["fov"]], x = c(600, 2100), y = c(900, 4700))
xenium.obj[["crop"]] <- crop
p2 <- ImageFeaturePlot(
  xenium.obj,
  fov = "crop",
  features = "Slc17a7",
  size = 1,
  axes = TRUE,
  max.cutoff = "q90")
p2
```

While `FindTransferAnchors` can be used to integrate spot-level data from spatial transcriptomic datasets, Seurat v5 also includes support for the [Robust Cell Type Decomposition](https://www.nature.com/articles/s41587-021-00830-w), a computational approach to deconvolve spot-level data from spatial datasets, when provided with an scRNA-seq reference. RCTD has been shown to accurately annotate spatial data from a variety of technologies, including SLIDE-seq, Visium, and the 10x Xenium in-situ spatial platform.

To run RCTD, we first install the `spacexr` package from GitHub which implements RCTD.

```{r, rctd.install, eval=FALSE}
devtools::install_github("dmcable/spacexr", build_vignettes = FALSE)
```

Counts, cluster, and spot information is extracted from the Seurat query and reference objects to construct `Reference` and `SpatialRNA` objects used by RCTD for annotation. The output of the annotation is then added to the Seurat object.

```{r rctd.qeury, warning=FALSE}
library(spacexr)

query.counts <- GetAssayData(xenium.obj, assay = "Xenium", slot = "counts")[, Cells(xenium.obj[["crop"]])]
coords <- GetTissueCoordinates(xenium.obj[["crop"]], which = "centroids")
rownames(coords) <- coords$cell
coords$cell <- NULL
query <- SpatialRNA(coords, query.counts, colSums(query.counts))
```

```{r rctd.reference, eval=FALSE}
# allen.corted.ref can be downloaded here: https://www.dropbox.com/s/cuowvm4vrf65pvq/allen_cortex.rds?dl=1
allen.cortex.ref <- readRDS("/brahms/shared/vignette-data/allen_cortex.rds")
allen.cortex.ref <- UpdateSeuratObject(allen.cortex.ref)

Idents(allen.cortex.ref) <- "subclass"
# remove CR cells because there aren't enough of them for annotation
allen.cortex.ref <- subset(allen.cortex.ref, subset = subclass != "CR")
counts <- GetAssayData(allen.cortex.ref, assay = "RNA", slot = "counts")
cluster <- as.factor(allen.cortex.ref$subclass)
names(cluster) <- colnames(allen.cortex.ref)
nUMI <- allen.cortex.ref$nCount_RNA
names(nUMI) <- colnames(allen.cortex.ref)
nUMI <- colSums(counts)
levels(cluster) <- gsub("/", "-", levels(cluster))
reference <- Reference(counts, cluster, nUMI)
```

```{r niche.run.rctd, warning=FALSE, results=FALSE, eval=FALSE}
# run RCTD with many cores
RCTD <- create.RCTD(query, reference, max_cores = 8)
RCTD <- run.RCTD(RCTD, doublet_mode = "doublet")
```

```{r niche.add.annotations, eval=FALSE}
annotations.df <- RCTD@results$results_df
annotations <- annotations.df$first_type
names(annotations) <- rownames(annotations.df)
xenium.obj$predicted.celltype <- annotations
keep.cells <- Cells(xenium.obj)[!is.na(xenium.obj$predicted.celltype)]
xenium.obj <- subset(xenium.obj, cells = keep.cells)
```

While the previous analyses consider each cell independently, spatial data enables cells to be defined not just by their neighborhood, but also by their broader spatial context. In Seurat v5, we introduce support for 'niche' analysis of spatial data, which demarcates regions of tissue ('niches'), each of which is defined by a different composition of spatially adjacent cell types. Inspired by methods in [Goltsev et al, Cell 2018](https://www.sciencedirect.com/science/article/pii/S0092867418309048) and [He et al, NBT 2022](https://www.nature.com/articles/s41587-022-01483-z), we consider the 'local neighborhood' for each cell - consisting of its `k.neighbor` spatially closest neighbors, and count the occurrences of each cell type present in this neighborhood. We then use k-means clustering to group cells that have similar neighborhoods together, into spatial niches. 

We call the `BuildNicheAssay` function from within Seurat to construct a new assay called `niche` containing the cell type composition spatially neighboring each cell. A metadata column called `niches` is also returned, which contains cluster assignments based on the niche assay.

```{r build.niche.assay, eval=FALSE}
xenium.obj <- BuildNicheAssay(
  object = xenium.obj,
  fov = "crop",
  group.by = "predicted.celltype",
  niches.k = 5,
  neighbors.k = 30
)
```

```{r load.niche.results, eval=TRUE, include=FALSE}
xenium.obj <- readRDS("/brahms/hartmana/vignette_data/xenium_niches_presaved.rds")
```

We can then group cells either by their cell type identity, or their niche identity. The niches identified clearly demarcate the neuronal layers in the cortex.

```{r, niche.dimplots, fig.width=8, fig.height=6, warning=FALSE}
celltype.plot <- ImageDimPlot(
  xenium.obj,
  group.by = "predicted.celltype",
  size = 1.5,
  cols = "polychrome",
  dark.background = F) +
  ggtitle("Cell type")
niche.plot <- ImageDimPlot(
  xenium.obj,
  group.by = "niches",
  size = 1.5,
  dark.background = F) +
  ggtitle("Niches") +
  scale_fill_manual(
    values = c("#442288", "#6CA2EA", "#B5D33D", "#FED23F", "#EB7D5B"))
celltype.plot | niche.plot
```

Further, we observe that the composition of each niche is enriched for distinct cell types.

```{r niche.composition}
table(xenium.obj$predicted.celltype, xenium.obj$niches)
```

# Human Lung: Nanostring CosMx Spatial Molecular Imager

This dataset was produced using Nanostring CosMx Spatial Molecular Imager (SMI). The CosMX SMI performs multiplexed single molecule profiling, can profile both RNA and protein targets, and can be applied directly to FFPE tissues. The dataset represents 8 FFPE samples taken from 5 non-small-cell lung cancer (NSCLC) tissues, and is available for [public download](https://www.nanostring.com/products/cosmx-spatial-molecular-imager/ffpe-dataset/). The gene panel consists of 960 transcripts. 

In this vignette, we load one of 8 samples (lung 5, replicate 1). We use the `LoadNanostring()` function, which parses the outputs available on the public download site. Note that the coordinates for the cell boundaries were provided by Nanostring by request, and are available for download [here](https://www.dropbox.com/s/hl3peavrx92bluy/Lung5_Rep1-polygons.csv?dl=0). 

For this dataset, instead of performing unsupervised analysis, we map the Nanostring profiles to our Azimuth Healthy Human Lung reference, which was defined by scRNA-seq. We used Azimuth version 0.4.3 with the [human lung](https://azimuth.hubmapconsortium.org/references/#Human%20-%20Lung%20v1) reference version 1.0.0.  You can download the precomputed results [here](https://seurat.nygenome.org/vignette_data/spatial_vignette_2/nanostring_data.Rds), which include annotations, prediction scores, and a UMAP visualization. The median number of detected transcripts/cell is 249, which does create uncertainty for the annotation process.

```{r load}
nano.obj <- LoadNanostring(data.dir = "/brahms/hartmana/vignette_data/nanostring/lung5_rep1", fov="lung5.rep1")
```

```{r integration}
# add in precomputed Azimuth annotations
azimuth.data <- readRDS("/brahms/hartmana/vignette_data/nanostring_data.Rds")
nano.obj <- AddMetaData(nano.obj, metadata = azimuth.data$annotations)
nano.obj[["proj.umap"]] <- azimuth.data$umap
Idents(nano.obj) <- nano.obj$predicted.annotation.l1

# set to avoid error exceeding max allowed size of globals
options(future.globals.maxSize = 8000 * 1024^2) 
nano.obj <- SCTransform(nano.obj, assay = "Nanostring", clip.range = c(-10, 10), verbose = FALSE)

# text display of annotations and prediction scores
head(slot(object = nano.obj, name = "meta.data")[2:5])
```

We can visualize the Nanostring cells and annotations, projected onto the reference-defined UMAP. Note that for this NSCLC sample, tumor samples are annotated as 'basal', which is the closest cell type match in the healthy reference. 

```{r, fig.width=9, fig.height=4}
DimPlot(nano.obj)
```

## Visualization of cell type and expression localization patterns

As in the previous example, `ImageDimPlot()` plots c ells based on their spatial locations, and colors them based on their assigned cell type. Notice that the basal cell population (tumor cells) is tightly spatially organized, as expected.

```{r, fig.width=11, fig.height=7}
ImageDimPlot(nano.obj, fov = "lung5.rep1", axes = TRUE, cols = "glasbey")
```

Since there are many cell types present, we can highlight the localization of a few select groups.

```{r, fig.width=10, fig.height=7}
ImageDimPlot(nano.obj, fov = "lung5.rep1", cells = WhichCells(nano.obj, idents=c("Basal", "Macrophage", "Smooth Muscle", "CD4 T")), cols=c("red", "green", "blue", "orange"), size = 0.6)
```

We can also visualize gene expression markers a few different ways:

```{r, fig.width=10, fig.height=5}
VlnPlot(nano.obj, features = "KRT17", assay = "Nanostring", layer = "counts", pt.size = 0.1, y.max = 30) + NoLegend()
```

```{r, fig.width=5, fig.height=4}
FeaturePlot(nano.obj, features = "KRT17", max.cutoff = "q95")
```

```{r, fig.height=4, fig.width=8}
p1 <- ImageFeaturePlot(nano.obj, fov = "lung5.rep1", features = "KRT17", max.cutoff = "q95")
p2 <- ImageDimPlot(nano.obj, fov = "lung5.rep1", alpha = 0.3, molecules = "KRT17", nmols = 10000) + NoLegend()
p1 + p2
```

We can plot molecules in order to co-visualize the expression of multiple markers, including KRT17 (basal cells), C1QA (macrophages), IL7R (T cells), and TAGLN (Smooth muscle cells).

```{r, fig.width=10, fig.height=7}
# Plot some of the molecules which seem to display spatial correlation with each other
ImageDimPlot(nano.obj, fov = "lung5.rep1", group.by = NA, alpha = 0.3, molecules = c("KRT17", "C1QA", "IL7R", "TAGLN"), nmols = 20000)
```

We zoom in on one basal-rich region using the `Crop()` function. Once zoomed-in, we can visualize individual cell boundaries as well in all visualizations.

```{r}
basal.crop <- Crop(nano.obj[["lung5.rep1"]], x = c(159500, 164000), y = c(8700, 10500))
nano.obj[["zoom1"]] <- basal.crop
DefaultBoundary(nano.obj[["zoom1"]]) <- "segmentation"
```

```{r, fig.width=11, fig.height=7}
ImageDimPlot(nano.obj, fov = "zoom1", cols = "polychrome", coord.fixed = FALSE)
```

```{r, fig.width=11, fig.height=7}
# note the clouds of TPSAB1 molecules denoting mast cells
ImageDimPlot(nano.obj, fov = "zoom1", cols = "polychrome", alpha = 0.3, molecules = c("KRT17", "IL7R", "TPSAB1"), mols.size = 0.3, nmols = 20000, border.color = "black", coord.fixed = FALSE)
```

# Human Lymph Node: Akoya CODEX system

This dataset was produced using Akoya CODEX system. The CODEX system performs multiplexed spatially-resolved protein profiling, iteratively visualizing antibody-binding events. The dataset here represents a tissue section from a human lymph node, and was generated by the University of Florida as part of the Human Biomolecular Atlas Program (HuBMAP). More information about the sample and experiment is available [here](https://portal.hubmapconsortium.org/browse/dataset/c95d9373d698faf60a66ffdc27499fe1). The protein panel in this dataset consists of 28 markers, and protein intensities were quantified as part of the Akoya processor pipeline, which outputs a CSV file providing the intensity of each marker in each cell, as well as the cell coordinates. The file is available for public download via Globus [here](https://app.globus.org/file-manager?origin_id=af603d86-eab9-4eec-bb1d-9d26556741bb&origin_path=%2Fc95d9373d698faf60a66ffdc27499fe1%2Fdrv_CX_20-008_lymphnode_n10_reg001%2Fprocessed_2020-12-2320-008LNn10r001%2Fsegm%2Fsegm-1%2Ffcs%2Fcompensated%2F).


First, we load in the data of a HuBMAP dataset using the `LoadAkoya()` function in Seurat: 

```{r}
codex.obj <- LoadAkoya(
  filename = "/brahms/hartmana/vignette_data/LN7910_20_008_11022020_reg001_compensated.csv",
  type = "processor",
  fov = "HBM754.WKLP.262"
)
```

We can now run unsupervised analysis to identify cell clusters. To normalize the protein data, we use centered log-ratio based normalization, as we typically apply to the protein modality of CITE-seq data. We then run dimensional reduction and graph-based clustering.

```{r}
codex.obj <- NormalizeData(object = codex.obj, normalization.method = "CLR", margin = 2)
codex.obj <- ScaleData(codex.obj)
VariableFeatures(codex.obj) <- rownames(codex.obj) # since the panel is small, treat all features as variable.
codex.obj <- RunPCA(object = codex.obj, npcs = 20, verbose = FALSE)
codex.obj <- RunUMAP(object = codex.obj, dims = 1:20, verbose = FALSE)
codex.obj <- FindNeighbors(object = codex.obj, dims = 1:20, verbose = FALSE)
codex.obj <- FindClusters(object = codex.obj, verbose = FALSE, resolution = 0.4, n.start = 1)
```

We can visualize the cell clusters based on a protein intensity-based UMAP embedding, or also based on their spatial location.

```{r}
DimPlot(codex.obj, label = TRUE, label.box = TRUE) + NoLegend()
```

```{r, fig.width=6, fig.height=5}
ImageDimPlot(codex.obj, cols = "parade")
```

The expression patters of individual markers clearly denote different cell types and spatial structures, including Lyve1 (lymphatic endothelial cells), CD34 (blood endothelial cells), and CD21 (B cells). As expected, endothelial cells group together into vessels, and B cells are key components of specialized microstructures known as germinal zones. You can read more about protein markers in this dataset, as well as cellular networks in human lynmphatic tissues, in this  [preprint](https://www.biorxiv.org/content/10.1101/2021.10.20.465151v1.full).

```{r, fig.width=9, fig.height=8}
p1 <- ImageFeaturePlot(codex.obj, fov = "HBM754.WKLP.262", features = c("CD34", "CD21", "Lyve1"), min.cutoff = "q10", max.cutoff = "q90")
p2 <- ImageDimPlot(codex.obj, fov = "HBM754.WKLP.262", cols = "parade")
p1 + p2
```

Each of these datasets represents an opportunity to learn organizing principles that govern the spatial localization of different cell types. Stay tuned for future updates to Seurat enabling further exploration and characterization of the relationship between spatial position and molecular state.

<details>
  <summary>**Session Info**</summary>
```{r}
sessionInfo()
```
</details>

```{r save.times, include=FALSE}
write.csv(x = t(as.data.frame(all_times)), file = "../output/timings/seurat5_spatial_vignette_2.csv")
```