Seurat-单细胞差异表达分析

加载数据

此插图强调了一些在Seurat中执行差异表达的示例工作流程。出于演示目的，我们将使用在第一个指导教程中创建的2700 PBMC对象。您可以在此处下载预先计算的对象。

library(Seurat)
pbmc <- readRDS(file = "../data/pbmc3k_final.rds")

执行默认的差异表达测试

可以通过该FindMarkers函数访问Seurat的大部分差异表达功能。默认情况下，Seurat基于非参数Wilcoxon秩和检验执行微分表达式。这将替换以前的默认测试（“ bimod”）。要测试两组特定细胞之间的差异表达，请指定ident.1和ident.2参数。

# list options for groups to perform differential expression on
levels(pbmc)

## [1] "Naive CD4 T"  "Memory CD4 T" "CD14+ Mono"   "B"            "CD8 T"       
## [6] "FCGR3A+ Mono" "NK"           "DC"           "Platelet"

# Find differentially expressed features between CD14+ and FCGR3A+ Monocytes
monocyte.de.markers <- FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono")
# view results
head(monocyte.de.markers)

	avg_logFC	pct.1	pct.2
FCGR3A	-2.617707	0.131	0.975
LYZ	1.812078	1.000	0.988
RHOC	-1.611576	0.162	0.864
S100A8	2.610695	0.975	0.500
S100A9	2.286734	0.996	0.870
IFITM2	-1.445771	0.677	1.000

结果数据框包含以下列：

p_val：p_val（未调整）
avg_logFC：两组之间平均表达的对数折叠通道。正值表示该特征在第一组中的表达更高。
pct.1：第一组中检测到该功能的像元百分比
pct.2：第二组中检测到该功能的单元格的百分比
p_val_adj：基于使用数据集中所有特征的bonferroni校正，调整后的p值。

如果ident.2省略该参数或将其设置为NULL，FindMarkers将测试由指定的组ident.1与所有其他单元格之间的差异表达特征。

# Find differentially expressed features between CD14+ Monocytes and all other cells, only
# search for positive markers
monocyte.de.markers <- FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = NULL, only.pos = TRUE)
# view results
head(monocyte.de.markers)

	avg_logFC	pct.1	pct.2
S100A9	3.860873	0.996	0.215
S100A8	3.796640	0.975	0.121
LGALS2	2.634295	0.908	0.059
FCN1	2.352693	0.952	0.151
CD14	1.951644	0.667	0.028
TYROBP	2.111879	0.994	0.265

预过滤功能或单元可提高DE测试的速度

为了提高标记发现的速度，特别是对于大型数据集，Seurat允许对特征或单元进行预过滤。例如，在一组细胞中很少检测到的特征或以相似的平均水平表达的特征不太可能被差异表达。所述的实施例的用例min.pct，logfc.threshold，min.diff.pct，和max.cells.per.ident参数在下面证明。

# Pre-filter features that are detected at <50% frequency in either CD14+ Monocytes or FCGR3A+
# Monocytes
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", min.pct = 0.5))

	avg_logFC	pct.1	pct.2
FCGR3A	-2.617707	0.131	0.975
LYZ	1.812078	1.000	0.988
RHOC	-1.611576	0.162	0.864
S100A8	2.610695	0.975	0.500
S100A9	2.286734	0.996	0.870
IFITM2	-1.445771	0.677	1.000

# Pre-filter features that have less than a two-fold change between the average expression of
# CD14+ Monocytes vs FCGR3A+ Monocytes
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", logfc.threshold = log(2)))

	avg_logFC	pct.1	pct.2
FCGR3A	-2.617707	0.131	0.975
LYZ	1.812078	1.000	0.988
RHOC	-1.611576	0.162	0.864
S100A8	2.610695	0.975	0.500
S100A9	2.286734	0.996	0.870
IFITM2	-1.445771	0.677	1.000

# Pre-filter features whose detection percentages across the two groups are similar (within
# 0.25)
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", min.diff.pct = 0.25))

	avg_logFC	pct.1	pct.2
FCGR3A	-2.617707	0.131	0.975
RHOC	-1.611576	0.162	0.864
S100A8	2.610695	0.975	0.500
IFITM2	-1.445771	0.677	1.000
LGALS2	2.049431	0.908	0.265
CDKN1C	-1.007729	0.029	0.506

# Increasing min.pct, logfc.threshold, and min.diff.pct, will increase the speed of DE testing,
# but could also miss features that are prefiltered

# Subsample each group to a maximum of 200 cells. Can be very useful for large clusters, or
# computationally-intensive DE tests
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", max.cells.per.ident = 200))

	avg_logFC	pct.1	pct.2
FCGR3A	-2.6177073	0.131	0.975
LYZ	1.8120776	1.000	0.988
S100A8	2.6106955	0.975	0.500
S100A9	2.2867339	0.996	0.870
IFITM2	-1.4457715	0.677	1.000
RPS19	-0.7563274	0.990	1.000

使用替代测试执行DE分析

当前支持以下差异表达测试：

“ wilcox”：Wilcoxon秩和检验（默认）
“ bimod”：单细胞特征表达的似然比测试（McDavid等，生物信息学，2013）
“ roc”：标准AUC分类器
“ t”：学生的t检验
“泊松”：假设潜在泊松分布的似然比检验。仅用于基于UMI的数据集
“ negbinom”：似然比检验，假设潜在的负二项式分布。仅用于基于UMI的数据集
“ LR”：使用逻辑回归框架确定差异表达的基因。构造一个逻辑回归模型，根据每个特征分别预测组成员身份，并将其与似然比检验的空模型进行比较。
“ MAST”：将细胞检测率视为协变量的GLM框架（Finak等，Genome Biology，2015）（安装说明）
“ DESeq2”：基于使用负二项式分布的模型的DE （Love等人，Genome Biology，2014）（安装说明）

对于MAST和DESeq2，请确保单独安装这些软件包，以便将它们用作Seurat的一部分。安装后，可以使用use test.use参数指定要使用的DE测试。

# Test for DE features using the MAST package
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", test.use = "MAST"))

	avg_logFC	pct.1	pct.2
LYZ	1.812078	1.000	0.988
FCGR3A	-2.617707	0.131	0.975
S100A9	2.286734	0.996	0.870
S100A8	2.610695	0.975	0.500
IFITM2	-1.445771	0.677	1.000
LGALS2	2.049431	0.908	0.265

# Test for DE features using the DESeq2 package. Throws an error if DESeq2 has not already been
# installed Note that the DESeq2 workflows can be computationally intensive for large datasets,
# but are incompatible with some feature pre-filtering options We therefore suggest initially
# limiting the number of cells used for testing
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", test.use = "DESeq2", max.cells.per.ident = 50))

	avg_logFC	pct.1	pct.2
S100A9	1.759457	0.996	0.870
LYZ	1.377950	1.000	0.988
S100A8	1.929894	0.975	0.500
FCGR3A	-2.044779	0.131	0.975
RPS19	-1.119358	0.990	1.000
IFITM2	-1.533646	0.677	1.000

致谢

我们感谢MAST和DESeq2软件包的作者的帮助和建议。我们还将用户引向Charlotte Soneson和Mark Robinson 进行的以下研究，该研究对单细胞差异表达测试的方法进行了仔细而广泛的评估。

参考材料：

https://satijalab.org/seurat/v3.1/de_vignette.html