Seurat包学习笔记(三):Analysis of spatial datasets

本次教程中,我们将学习如何使用Seurat3处理空间转录组数据。整体的分析流程类似于Seurat的单细胞RNA-seq分析流程,同时我们还引入了一些交互可视化的分析工具,将细胞所处的空间信息和分子表达信息进行整合。特别强调了空间和分子信息的集成。

image.png

主要的分析流程:

  • 数据的归一化
  • 数据降维和聚类
  • 检测空间可变特征(spatially-variable features)
  • 交互式可视化
  • 与单细胞RNA-seq数据进行整合
  • 处理多个切片数据

在本教程中,我们使用来自10x Genomics的Visium技术生成的数据集进行示例分析。同时,我们今后还会将Seurat扩展到处理其他空间转录组技术产生的数据类型,如SLIDE-Seq、STARmap和MERFISH等。

使用Seurat3处理10x Visium空间转录组数据

安装并加载所需的R包和数据集

# 使用devtools进行安装
devtools::install_github("satijalab/seurat", ref = "spatial")
library(Seurat)
library(ggplot2)
library(patchwork)
library(dplyr)

library(SeuratData)
# 使用SeuratData包安装小鼠大脑的空间转录组数据
InstallData("stxBrain")
library(stxBrain.SeuratData)
# 使用LoadData函数加载数据集
brain <- LoadData("stxBrain", type = "anterior1")
brain
An object of class Seurat 
31053 features across 2696 samples within 1 assay 
Active assay: Spatial (31053 features, 0 variable features)

# 查看Seurat对象包含的信息
View(brain)
image.png

10x genomics Visium技术产生的数据主要包括以下数据类型:

  • A spot by gene expression matrix
  • An image of the tissue slice (obtained from H&E staining during data acquisition)
  • Scaling factors that relate the original high resolution image to the lower resolution image used here for visualization

数据预处理

# 数据质控,查看原始数据的基本特征
plot1 <- VlnPlot(brain, features = "nCount_Spatial", pt.size = 0.1) + NoLegend()
plot2 <- SpatialFeaturePlot(brain, features = "nCount_Spatial") + theme(legend.position = "right")
wrap_plots(plot1, plot2)
image.png
# 数据标准化,使用SCTransform方法进行标准化
brain <- SCTransform(brain, assay = "Spatial", verbose = FALSE)

比较SCTransform方法和常规log-normalization的效果

# rerun normalization to store sctransform residuals for all genes
# SCTransform
brain <- SCTransform(brain, assay = "Spatial", return.only.var.genes = FALSE, verbose = FALSE)

# also run standard log normalization for comparison
# NormalizeData
brain <- NormalizeData(brain, verbose = FALSE, assay = "Spatial")

# Computes the correlation of the log normalized data and sctransform residuals with the number of UMIs
brain <- GroupCorrelation(brain, group.assay = "Spatial", assay = "Spatial", slot = "data", do.plot = FALSE)
brain <- GroupCorrelation(brain, group.assay = "Spatial", assay = "SCT", slot = "scale.data", do.plot = FALSE)

p1 <- GroupCorrelationPlot(brain, assay = "Spatial", cor = "nCount_Spatial_cor") + ggtitle("Log Normalization") + 
    theme(plot.title = element_text(hjust = 0.5))
p2 <- GroupCorrelationPlot(brain, assay = "SCT", cor = "nCount_Spatial_cor") + ggtitle("SCTransform Normalization") + 
    theme(plot.title = element_text(hjust = 0.5))
p1 + p2
image.png

在上面的箱线图中,我们计算了每个特征(基因)与UMI数量(此处为变量nCount_Spatial)之间的相关性。然后,根据基因的平均表达量将它们进行了分组,最终生成了这些相关系数的箱线图。可以看到,使用常规的NormalizeData进行log-normalization未能使前三组的基因充分归一化,这表明技术因素继续影响高表达的基因的归一化表达估计值。相反,sctransform方法进行归一化可以有效的降低这种影响。

基因表达的可视化

在Seurat v3.2中,我们加入了新的功能函数来对空间转录组的数据进行可视化的探索。我们将Seurat的SpatialFeaturePlot函数扩展了FeaturePlot, 可以将基因的表达数据还原在样本组织的空间位置上。例如,在这个小鼠大脑切片数据中,Hpca基因是一个强烈的海马区(hippocampal )marker ,Ttr是一个脉络丛(choroid plexus)marker。

SpatialFeaturePlot(brain, features = c("Hpca", "Ttr"))
image.png

Seurat的默认参数强调了分子数据的可视化。我们也可以通过调整其他参数,如调整斑点的大小(和它们的透明度),来改善组织学图像的可视化效果。通过改变以下参数:

  • pt.size.factor -- 调整斑点的大小。默认值为1.6
  • alpha -- 调整斑点的最小和最大透明度。默认值为c(1,1)。
    尝试设置alpha为c(0.1,1),以降低具有较低表达的点的透明度
p1 <- SpatialFeaturePlot(brain, features = "Ttr", pt.size.factor = 1)
p2 <- SpatialFeaturePlot(brain, features = "Ttr", alpha = c(0.1, 1))
p1 + p2
image.png

数据降维、聚类与可视化

# PCA降维
brain <- RunPCA(brain, assay = "SCT", verbose = FALSE)
# 图聚类分群
brain <- FindNeighbors(brain, reduction = "pca", dims = 1:30)
brain <- FindClusters(brain, verbose = FALSE)
# UMAP降维可视化
brain <- RunUMAP(brain, reduction = "pca", dims = 1:30)
p1 <- DimPlot(brain, reduction = "umap", label = TRUE)
# 使用SpatialDimPlot函数进行可视化
p2 <- SpatialDimPlot(brain, label = TRUE, label.size = 3)
p1 + p2
image.png
# 使用cells.highlight参数高亮感兴趣的一些细胞
SpatialDimPlot(brain, cells.highlight = CellsByIdentities(object = brain, idents = c(1, 2, 5, 3, 4, 8)), facet.highlight = TRUE, ncol = 3)
image.png

交互式可视化

We have also built in a number of interactive plotting capabilities. Both SpatialDimPlot and SpatialFeaturePlot now have an interactive parameter, that when set to TRUE, will open up the Rstudio viewer pane with an interactive Shiny plot.

SpatialDimPlot(brain, interactive = TRUE)
image.png
LinkedDimPlot(brain)
image.png

鉴定空间可变基因(Spatially Variable Features)

Seurat提供了两种工作流程来识别与组织内空间位置相关的分子特征。

  • 第一种方法,可以基于组织内预先标注的解剖区域进行差异表达分析,这可以从非监督聚类或先验知识中确定。在这种情况下,此策略将起作用,因为上面的cluster显示出明显的空间限制。
de_markers <- FindMarkers(brain, ident.1 = 4, ident.2 = 6)
SpatialFeaturePlot(object = brain, features = rownames(de_markers)[1:3], alpha = c(0.1, 1), ncol = 3)
image.png
  • 第二种方法,使用FindSpatiallyVariables函数在没有预先注释的情况下鉴定出不同空间模式(spatial patterning)的特征。
brain <- FindSpatiallyVariableFeatures(brain, assay = "SCT", features = VariableFeatures(brain)[1:1000], selection.method = "markvariogram")
top.features <- head(SpatiallyVariableFeatures(brain, selection.method = "markvariogram"), 6)
SpatialFeaturePlot(brain, features = top.features, ncol = 3, alpha = c(0.1, 1))
image.png

可视化解剖区域的子集

# 提取子集
cortex <- subset(brain, idents = c(1, 2, 3, 5, 6, 7))
# now remove additional cells, use SpatialDimPlots to visualize what to remove
# SpatialDimPlot(cortex,cells.highlight = WhichCells(cortex, expression = #image_imagerow > 400 | image_imagecol < 150))
# 根据不同的条件进行筛选提取子集
cortex <- subset(cortex, anterior1_imagerow > 400 | anterior1_imagecol < 150, invert = TRUE)
cortex <- subset(cortex, anterior1_imagerow > 275 & anterior1_imagecol > 370, invert = TRUE)
cortex <- subset(cortex, anterior1_imagerow > 250 & anterior1_imagecol > 440, invert = TRUE)
# 数据可视化
p1 <- SpatialDimPlot(cortex, crop = TRUE, label = TRUE)
p2 <- SpatialDimPlot(cortex, crop = FALSE, label = TRUE, pt.size.factor = 1, label.size = 3)
p1 + p2
image.png

整合单细胞RNA-seq测序数据

# 加载单细胞RNA-seq数据
allen_reference <- readRDS("~/Downloads/allen_cortex.rds")
# note that setting ncells=3000 normalizes the full dataset but learns noise models on 3k cells this speeds up SCTransform dramatically with no loss in performance
library(dplyr)

# 进行数据标准化结合PCA降维
allen_reference <- SCTransform(allen_reference, ncells = 3000, verbose = FALSE) %>% RunPCA(verbose = FALSE) %>% RunUMAP(dims = 1:30)

# After subsetting, we renormalize cortex
cortex <- SCTransform(cortex, assay = "Spatial", verbose = FALSE) %>% RunPCA(verbose = FALSE)

# the annotation is stored in the 'subclass' column of object metadata
DimPlot(allen_reference, group.by = "subclass", label = TRUE)
image.png
# 识别整合的anchors
anchors <- FindTransferAnchors(reference = allen_reference, query = cortex, normalization.method = "SCT")
# 进行数据映射整合
predictions.assay <- TransferData(anchorset = anchors, refdata = allen_reference$subclass, prediction.assay = TRUE, weight.reduction = cortex[["pca"]])
# 添加预测信息
cortex[["predictions"]] <- predictions.assay

DefaultAssay(cortex) <- "predictions"
# 数据可视化
SpatialFeaturePlot(cortex, features = c("L2/3 IT", "L4"), pt.size.factor = 1.6, ncol = 2, crop = TRUE)
image.png
# 鉴定空间可变基因
cortex <- FindSpatiallyVariableFeatures(cortex, assay = "predictions", selection.method = "markvariogram", features = rownames(cortex), r.metric = 5, slot = "data")
top.clusters <- head(SpatiallyVariableFeatures(cortex), 4)
# 数据可视化
SpatialPlot(object = cortex, features = top.clusters, ncol = 2)
image.png
# marker基因可视化
SpatialFeaturePlot(cortex, features = c("Astro", "L2/3 IT", "L4", "L5 PT", "L5 IT", "L6 CT", "L6 IT", "L6b", "Oligo"), pt.size.factor = 1, ncol = 2, crop = FALSE, alpha = c(0.1, 1))
image.png

使用Seurat处理多个切片数据

brain2 <- LoadData("stxBrain", type = "posterior1")
brain2 <- SCTransform(brain2, assay = "Spatial", verbose = FALSE)
# 使用merge函数合并多个slices
brain.merge <- merge(brain, brain2)

DefaultAssay(brain.merge) <- "SCT"
VariableFeatures(brain.merge) <- c(VariableFeatures(brain), VariableFeatures(brain2))
# 数据降维,聚类与可视化
brain.merge <- RunPCA(brain.merge, verbose = FALSE)
brain.merge <- FindNeighbors(brain.merge, dims = 1:30)
brain.merge <- FindClusters(brain.merge, verbose = FALSE)
brain.merge <- RunUMAP(brain.merge, dims = 1:30)
DimPlot(brain.merge, reduction = "umap", group.by = c("ident", "orig.ident"))
image.png
SpatialDimPlot(brain.merge)
image.png
SpatialFeaturePlot(brain.merge, features = c("Hpca", "Plp1"))
image.png

使用Seurat3处理Slide-seq空间转录组数据

Here, we will be analyzing a dataset generated using Slide-seq v2of the mouse hippocampus.

安装并加载所需的数据集

library(SeuratData)
InstallData("ssHippo")
slide.seq <- LoadData("ssHippo")
View(slide.seq)

数据预处理

# 数据质控,查看原始数据的分布特征
plot1 <- VlnPlot(slide.seq, features = "nCount_Spatial", pt.size = 0, log = TRUE) + NoLegend()
# 对原始count进行log标准化处理
slide.seq$log_nCount_Spatial <- log(slide.seq$nCount_Spatial)
plot2 <- SpatialFeaturePlot(slide.seq, features = "log_nCount_Spatial") + theme(legend.position = "right")
wrap_plots(plot1, plot2)
image.png
# 使用SCTransform进行标准化处理
slide.seq <- SCTransform(slide.seq, assay = "Spatial", ncells = 3000, verbose = FALSE)
# PCA降维
slide.seq <- RunPCA(slide.seq)
slide.seq <- RunUMAP(slide.seq, dims = 1:30)
# 数据聚类分群
slide.seq <- FindNeighbors(slide.seq, dims = 1:30)
slide.seq <- FindClusters(slide.seq, resolution = 0.3, verbose = FALSE)
# 数据可视化
plot1 <- DimPlot(slide.seq, reduction = "umap", label = TRUE)
plot2 <- SpatialDimPlot(slide.seq, stroke = 0)
plot1 + plot2
image.png
# 使用cells.highlight参数高亮特定细胞
SpatialDimPlot(slide.seq, cells.highlight = CellsByIdentities(object = slide.seq, idents = c(0, 7, 13)), facet.highlight = TRUE)
image.png

整合单细胞RNA-seq参考数据

# 加载单细胞RNA-seq数据
ref <- readRDS("~/Downloads/mouse_hippocampus_reference.rds")

# 使用FindTransferAnchors函数识别参考的anchors
anchors <- FindTransferAnchors(reference = ref, query = slide.seq, normalization.method = "SCT", npcs = 50)
# 数据映射进行整合
predictions.assay <- TransferData(anchorset = anchors, refdata = ref$celltype, prediction.assay = TRUE, weight.reduction = slide.seq[["pca"]], dims = 1:50)
# 添加预测信息
slide.seq[["predictions"]] <- predictions.assay

DefaultAssay(slide.seq) <- "predictions"
# 数据可视化
SpatialFeaturePlot(slide.seq, features = c("Dentate Principal cells", "CA3 Principal cells", "Entorhinal cortex", "Endothelial tip", "Ependymal", "Oligodendrocyte"), alpha = c(0.1, 1))
image.png
slide.seq$predicted.id <- GetTransferPredictions(slide.seq)
Idents(slide.seq) <- "predicted.id"
SpatialDimPlot(slide.seq, cells.highlight = CellsByIdentities(object = slide.seq, idents = c("CA3 Principal cells", "Dentate Principal cells", "Endothelial tip")), facet.highlight = TRUE)
image.png

鉴定空间可变基因

Here, we demonstrate the latter with an implementation of Moran’s I available via FindSpatiallyVariableFeatures by setting method = 'moransi'.
Moran’s I computes an overall spatial autocorrelation and gives a statistic (similar to a correlation coefficient) that measures the dependence of a feature on spatial location.
This allows us to rank features based on how spatially variable their expression is.
In order to facilitate quick estimation of this statistic, we implemented a basic binning strategy that will draw a rectangular grid over Slide-seq puck and average the feature and location within each bin. The number of bins in the x and y direction are controlled by the x.cuts and y.cuts parameters respectively. Additionally, while not required, installing the optional Rfast2 package(install.packages('Rfast2')), will significantly decrease the runtime via a more efficient implementation.

DefaultAssay(slide.seq) <- "SCT"
# 识别空间可变基因,使用selection.method = "moransi"方法
slide.seq <- FindSpatiallyVariableFeatures(slide.seq, assay = "SCT", slot = "scale.data", features = VariableFeatures(slide.seq)[1:1000], selection.method = "moransi", x.cuts = 100, y.cuts = 100)
# 数据可视化
SpatialFeaturePlot(slide.seq, features = head(SpatiallyVariableFeatures(slide.seq, selection.method = "moransi"), 6), ncol = 3, alpha = c(0.1, 1), max.cutoff = "q95")
image.png

参考来源:
https://satijalab.org/seurat/v3.1/spatial_vignette.html

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 219,366评论 6 508
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,521评论 3 395
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 165,689评论 0 356
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,925评论 1 295
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,942评论 6 392
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,727评论 1 305
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,447评论 3 420
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,349评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,820评论 1 317
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,990评论 3 337
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,127评论 1 351
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,812评论 5 346
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,471评论 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,017评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,142评论 1 272
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,388评论 3 373
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,066评论 2 355

推荐阅读更多精彩内容