此插图强调了一些在Seurat中执行差异表达的示例工作流程。出于演示目的,我们将使用在第一个指导教程中创建的2700 PBMC对象。您可以在此处下载预先计算的对象。
pbmc <- readRDS(file = "../data/pbmc3k_final.rds")
函数访问Seurat的大部分差异表达功能。默认情况下,Seurat基于非参数Wilcoxon秩和检验执行微分表达式。这将替换以前的默认测试(“ bimod”)。要测试两组特定细胞之间的差异表达,请指定ident.1
# list options for groups to perform differential expression on
## [1] "Naive CD4 T" "Memory CD4 T" "CD14+ Mono" "B" "CD8 T"
## [6] "FCGR3A+ Mono" "NK" "DC" "Platelet"
# Find differentially expressed features between CD14+ and FCGR3A+ Monocytes
monocyte.de.markers <- FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono")
# view results
p_val | avg_logFC | pct.1 | pct.2 | p_val_adj | |
FCGR3A | 0 | -2.617707 | 0.131 | 0.975 | 0 |
LYZ | 0 | 1.812078 | 1.000 | 0.988 | 0 |
RHOC | 0 | -1.611576 | 0.162 | 0.864 | 0 |
S100A8 | 0 | 2.610695 | 0.975 | 0.500 | 0 |
S100A9 | 0 | 2.286734 | 0.996 | 0.870 | 0 |
IFITM2 | 0 | -1.445771 | 0.677 | 1.000 | 0 |
- p_val:p_val(未调整)
- avg_logFC:两组之间平均表达的对数折叠通道。正值表示该特征在第一组中的表达更高。
- pct.1:第一组中检测到该功能的像元百分比
- pct.2:第二组中检测到该功能的单元格的百分比
- p_val_adj:基于使用数据集中所有特征的bonferroni校正,调整后的p值。
# Find differentially expressed features between CD14+ Monocytes and all other cells, only
# search for positive markers
monocyte.de.markers <- FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = NULL, only.pos = TRUE)
# view results
p_val | avg_logFC | pct.1 | pct.2 | p_val_adj | |
S100A9 | 0 | 3.860873 | 0.996 | 0.215 | 0 |
S100A8 | 0 | 3.796640 | 0.975 | 0.121 | 0 |
LGALS2 | 0 | 2.634295 | 0.908 | 0.059 | 0 |
FCN1 | 0 | 2.352693 | 0.952 | 0.151 | 0 |
CD14 | 0 | 1.951644 | 0.667 | 0.028 | 0 |
TYROBP | 0 | 2.111879 | 0.994 | 0.265 | 0 |
# Pre-filter features that are detected at <50% frequency in either CD14+ Monocytes or FCGR3A+
# Monocytes
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", min.pct = 0.5))
p_val | avg_logFC | pct.1 | pct.2 | p_val_adj | |
FCGR3A | 0 | -2.617707 | 0.131 | 0.975 | 0 |
LYZ | 0 | 1.812078 | 1.000 | 0.988 | 0 |
RHOC | 0 | -1.611576 | 0.162 | 0.864 | 0 |
S100A8 | 0 | 2.610695 | 0.975 | 0.500 | 0 |
S100A9 | 0 | 2.286734 | 0.996 | 0.870 | 0 |
IFITM2 | 0 | -1.445771 | 0.677 | 1.000 | 0 |
# Pre-filter features that have less than a two-fold change between the average expression of
# CD14+ Monocytes vs FCGR3A+ Monocytes
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", logfc.threshold = log(2)))
p_val | avg_logFC | pct.1 | pct.2 | p_val_adj | |
FCGR3A | 0 | -2.617707 | 0.131 | 0.975 | 0 |
LYZ | 0 | 1.812078 | 1.000 | 0.988 | 0 |
RHOC | 0 | -1.611576 | 0.162 | 0.864 | 0 |
S100A8 | 0 | 2.610695 | 0.975 | 0.500 | 0 |
S100A9 | 0 | 2.286734 | 0.996 | 0.870 | 0 |
IFITM2 | 0 | -1.445771 | 0.677 | 1.000 | 0 |
# Pre-filter features whose detection percentages across the two groups are similar (within
# 0.25)
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", min.diff.pct = 0.25))
p_val | avg_logFC | pct.1 | pct.2 | p_val_adj | |
FCGR3A | 0 | -2.617707 | 0.131 | 0.975 | 0 |
RHOC | 0 | -1.611576 | 0.162 | 0.864 | 0 |
S100A8 | 0 | 2.610695 | 0.975 | 0.500 | 0 |
IFITM2 | 0 | -1.445771 | 0.677 | 1.000 | 0 |
LGALS2 | 0 | 2.049431 | 0.908 | 0.265 | 0 |
CDKN1C | 0 | -1.007729 | 0.029 | 0.506 | 0 |
# Increasing min.pct, logfc.threshold, and min.diff.pct, will increase the speed of DE testing,
# but could also miss features that are prefiltered
# Subsample each group to a maximum of 200 cells. Can be very useful for large clusters, or
# computationally-intensive DE tests
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", max.cells.per.ident = 200))
p_val | avg_logFC | pct.1 | pct.2 | p_val_adj | |
FCGR3A | 0 | -2.6177073 | 0.131 | 0.975 | 0 |
LYZ | 0 | 1.8120776 | 1.000 | 0.988 | 0 |
S100A8 | 0 | 2.6106955 | 0.975 | 0.500 | 0 |
S100A9 | 0 | 2.2867339 | 0.996 | 0.870 | 0 |
IFITM2 | 0 | -1.4457715 | 0.677 | 1.000 | 0 |
RPS19 | 0 | -0.7563274 | 0.990 | 1.000 | 0 |
- “ wilcox”:Wilcoxon秩和检验(默认)
- “ bimod”:单细胞特征表达的似然比测试(McDavid等,生物信息学,2013)
- “ roc”:标准AUC分类器
- “ t”:学生的t检验
- “泊松”:假设潜在泊松分布的似然比检验。仅用于基于UMI的数据集
- “ negbinom”:似然比检验,假设潜在的负二项式分布。仅用于基于UMI的数据集
- “ LR”:使用逻辑回归框架确定差异表达的基因。构造一个逻辑回归模型,根据每个特征分别预测组成员身份,并将其与似然比检验的空模型进行比较。
- “ MAST”:将细胞检测率视为协变量的GLM框架(Finak等,Genome Biology,2015)(安装说明)
- “ DESeq2”:基于使用负二项式分布的模型的DE (Love等人,Genome Biology,2014)(安装说明)
对于MAST和DESeq2,请确保单独安装这些软件包,以便将它们用作Seurat的一部分。安装后,可以使用use test.use
# Test for DE features using the MAST package
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", test.use = "MAST"))
p_val | avg_logFC | pct.1 | pct.2 | p_val_adj | |
LYZ | 0 | 1.812078 | 1.000 | 0.988 | 0 |
FCGR3A | 0 | -2.617707 | 0.131 | 0.975 | 0 |
S100A9 | 0 | 2.286734 | 0.996 | 0.870 | 0 |
S100A8 | 0 | 2.610695 | 0.975 | 0.500 | 0 |
IFITM2 | 0 | -1.445771 | 0.677 | 1.000 | 0 |
LGALS2 | 0 | 2.049431 | 0.908 | 0.265 | 0 |
# Test for DE features using the DESeq2 package. Throws an error if DESeq2 has not already been
# installed Note that the DESeq2 workflows can be computationally intensive for large datasets,
# but are incompatible with some feature pre-filtering options We therefore suggest initially
# limiting the number of cells used for testing
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", test.use = "DESeq2", max.cells.per.ident = 50))
p_val | avg_logFC | pct.1 | pct.2 | p_val_adj | |
S100A9 | 0 | 1.759457 | 0.996 | 0.870 | 0 |
LYZ | 0 | 1.377950 | 1.000 | 0.988 | 0 |
S100A8 | 0 | 1.929894 | 0.975 | 0.500 | 0 |
FCGR3A | 0 | -2.044779 | 0.131 | 0.975 | 0 |
RPS19 | 0 | -1.119358 | 0.990 | 1.000 | 0 |
IFITM2 | 0 | -1.533646 | 0.677 | 1.000 | 0 |
我们感谢MAST和DESeq2软件包的作者的帮助和建议。我们还将用户引向Charlotte Soneson和Mark Robinson 进行的以下研究,该研究对单细胞差异表达测试的方法进行了仔细而广泛的评估。