前沿
单细胞常规分析用得最多的软件非R语言的Seurat包莫属,当然还有Python语言的Scanpy,但是对于入门者来说,R语言更容易上手,而且很多单细胞分析的扩展包也是基于Seurat的,或者很多R包的输入就是Seurat参数。
然而,对于数据量较大的情况,R语言就显得很鸡肋,不仅耗运行内存而且很慢,因此基于Python的软件包更为适合分析大数据,而且R语言能实现的Python基本上都能实现。随着单细胞组学的迅速发展,未来数据也将迎来超大量级的产出,基于Python的软件包应该更有前景。
言归正传,因为Seurat的常规聚类基本不变,本文主要把此流程封装成函数,这样之后调用起来都比较方便和简洁。
-
常规流程版
下面的函数经过NormalizeData和ScaleData常规聚类步骤,还可以根据需求设置参数。
- obj,Seurat对象
- mt.pattern或mt.list,mt.pattern模糊匹配线粒体基因名,mt.list给定线粒体基因向量
- dim.use,PCA主成分数目
- mt.cutoff,线粒体百分比阈值
- nf.low或nf.high,基因数下限或上下
- nfeatures,选取高变基因数目
- res,亚群分辨率
seob_cluster <- function(obj,
mt.pattern="^MT-",mt.list=NULL,dim.use=30,mt.cutoff=10,
nf.low=200,nf.high=6000,nfeatures=3000,
res=1) {
all <- obj
if (is.null(mt.list)) {
all[["percent.mt"]] <- PercentageFeatureSet(all, pattern = mt.pattern)
}else{
mt.list <- mt.list[which(mt.list %in% rownames(all))]
all[["percent.mt"]] <- PercentageFeatureSet(all, features = mt.list)
}
all <- subset(all, subset = nFeature_RNA > nf.low & percent.mt < mt.cutoff & nFeature_RNA < nf.high)
all <- NormalizeData(all, normalization.method = "LogNormalize", scale.factor = 10000)
all <- FindVariableFeatures(all, selection.method = "vst", nfeatures = nfeatures)
all.genes <- rownames(all)
all <- ScaleData(all, features = all.genes)
all <- RunPCA(all, features = VariableFeatures(all), npcs = 40, verbose = F)
all <- FindNeighbors(all, dims = 1:dim.use)
all <- FindClusters(all, resolution = res)
all <- RunUMAP(all, dims = 1:dim.use)
}
-
SCTransform版
根据官网介绍,SCTransform处理更为合理,因此也可以使用以下流程,参数与上面的函数一致。
sct_seob_cluster <- function(obj,
mt.pattern="^MT-",mt.list=NULL,dim.use=30,mt.cutoff=10,
nf.low=200,nf.high=6000,nfeatures=3000,
res=1) {
pbmc <- obj
if (is.null(mt.list)) {
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = mt.pattern)
}else{
mt.list <- mt.list[which(mt.list %in% rownames(pbmc))]
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, features = mt.list)
}
pbmc <- subset(pbmc, subset = nFeature_RNA > nf.low & percent.mt < mt.cutoff & nFeature_RNA < nf.high)
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = nfeatures)
if (!requireNamespace("glmGamPoi", quietly = TRUE)) {
pbmc <- SCTransform(pbmc, vars.to.regress = "percent.mt", verbose = FALSE)
}else{
pbmc <- SCTransform(pbmc, method = "glmGamPoi", vars.to.regress = "percent.mt", verbose = FALSE)
}
pbmc <- RunPCA(pbmc, features = VariableFeatures(pbmc), npcs = 40, verbose = F)
pbmc <- RunUMAP(pbmc, dims = 1:dim.use, verbose = FALSE)
pbmc <- FindNeighbors(pbmc, dims = 1:dim.use, verbose = FALSE)
pbmc <- FindClusters(pbmc, verbose = FALSE, resolution = res)
}