作者,Evil Genius
临近年关了,我们暂时就不往前走了,总结一些好用的方法,重点就是多样本的niche分析,细胞聚类分析,邻域分析等等。
HD数据慢慢也多起来了,对于HD的分析要求也慢慢上来了,这一篇我们来讨论关于HD数据的整合方案。
首先我们需要考虑几个问题。
1、HD数据质量如何?尤其是捕获到的基因中位数,如果基因捕获过低,而聚类同时采用Seurat的默认函数FindVariableFeatures的情况下,大家觉得效果会好么?比如8um基因中位数200,高变基因采用2000,会带来什么样的效果?
2、分析的策略,大家应该都知道了,HD有图像分割的方法,那么图像分割还是bin模式的分析,如何选择呢?给大家一个例子,华大平台的精度更高,但是华大空间发表的文章几乎全部采用bin模式,为什么?图像分割目前方法还有待提升。
3、整合的策略,大家都知道了,Seurat推荐了banksy的方法,考虑空间邻域,那么多样本的情况下,样本之间空间坐标是不同的体系,如何进行整合?
今天我们思考三个方案,首先第一个 CCA + banksy(Seurat V5),脚本我们先不封装,两个样本为例
suppressMessages({
library(Seurat)
library(argparse)
library(dplyr)
library(ggplot2)
library(Banksy)
library(SeuratWrappers)
library(harmony)
})
sample1 <- Load10X_Spatial(data.dir = '/home/samples/DB/Spatial/visium_data/HC_1L', bin.size = c(16))
sample2 <- Load10X_Spatial(data.dir = '/home/samples/DB/Spatial/visium_data/HC_1T', bin.size = c(16))
sample1 <- FindVariableFeatures(sample1)
sample2 <- FindVariableFeatures(sample2)
sample1 <- ScaleData(sample1)
sample2 <- ScaleData(sample2)
# we select 50,0000 cells and create a new 'sketch' assay
sample1 <- SketchData(
object = sample1,
ncells = 50000,
method = "LeverageScore",
sketched.assay = "sketch"
)
# we select 50,0000 cells and create a new 'sketch' assay
sample2 <- SketchData(
object = sample2,
ncells = 50000,
method = "LeverageScore",
sketched.assay = "sketch"
)
DefaultAssay(sample1) <- "sketch"
DefaultAssay(sample2) <- "sketch"
# perform clustering workflow
sample1 <- FindVariableFeatures(sample1)
sample1 <- ScaleData(sample1)
sample1 <- RunPCA(sample1, assay = "sketch", reduction.name = "pca.sketch")
sample2 <- FindVariableFeatures(sample2)
sample2 <- ScaleData(sample2)
sample2 <- RunPCA(sample2, assay = "sketch", reduction.name = "pca.sketch")
####banksy
sample1 <- RunBanksy(sample1,
lambda = 0.8, verbose = TRUE,
assay = "Spatial.016um", slot = "data", features = "variable",
k_geom = 50
)
sample2 <- RunBanksy(sample2,
lambda = 0.8, verbose = TRUE,
assay = "Spatial.016um", slot = "data", features = "variable",
k_geom = 50
)
####PCA
DefaultAssay(sample1) <- "BANKSY"
sample1<- RunPCA(sample1, assay = "BANKSY", reduction.name = "pca.banksy", features = rownames(object), npcs = 30)
DefaultAssay(sample2) <- "BANKSY"
sample2 <- RunPCA(sample2, assay = "BANKSY", reduction.name = "pca.banksy", features = rownames(object), npcs = 30)
这个时候注意,单样本跑完banksy的时候,就要考虑整合的问题,先来看看普通的整合
ifnb <- IntegrateLayers(object = ifnb,
method = CCAIntegration,
orig.reduction = "pca",
new.reduction = "integrated.cca",
verbose = FALSE)
思路就在这里
ifnb = list(sample1,sample2)
ifnb <- IntegrateLayers(object = ifnb,
method = CCAIntegration,
orig.reduction = "pca.banksy",
assay = "BANKSY",
new.reduction = "integrated.cca",
verbose = FALSE)
这个地方就完成了数据的整合,继续往下
# re-join layers after integration
ifnb[["Spatial.016um"]] <- JoinLayers(ifnb[['Spatial.016um']])
ifnb <- FindNeighbors(ifnb, reduction = "integrated.cca", dims = 1:30)
ifnb <- FindClusters(ifnb, cluster.name = "banksy_cluster", resolution = 0.5)
这是第一个方案
接下来我们看第二个方案,rpca + banksy
前面都一样,关键的步骤在
ifnb <- IntegrateLayers(
object = ifnb, method = RPCAIntegration,
orig.reduction = "pca.banksy", new.reduction = "integrated.rpca",
assay = "BANKSY",
verbose = FALSE
)
最后是harmony + banksy的方案
ifnb <- IntegrateLayers(
object = ifnb, method = HarmonyIntegration,
orig.reduction = "pca.banksy", new.reduction = "harmony",assay = "BANKSY",
verbose = FALSE
)
究竟效果如何,或者是否有问题,我们需要明天揭晓了,如果有问题会随时更新脚本。
生活很好,有你更好