FindMarkers
gzh:BBio
对分群结果进行差异基因鉴定的函数,理想情况下,对于每群细胞来说,marker基因都位于上调基因的前列。
#FindMarker结果以p_val从小到大排列,wilcox检验方法使用data中的数据。pct.1和pct.2则分别为基因在对应ident参数中的表达比例。
df <- FindMarkers(pbmc, ident.1=1, ident.2=0, slot="data", logfc.threshold=0.25, min.pct=0.1, test.use = "wilcox")
head(df)
# p_val avg_logFC pct.1 pct.2 p_val_adj
# TYMP 1.702818e-11 2.539289 1.00 0.111 3.916481e-09
# CST3 4.469249e-11 2.552769 1.00 0.306 1.027927e-08
# S100A8 5.334985e-11 4.037048 0.96 0.111 1.227047e-08
# LYZ 6.997602e-11 3.082150 1.00 0.417 1.609449e-08
# HLA-DRB1 3.287672e-10 3.325130 0.88 0.083 7.561646e-08
# HLA-DPB1 4.061018e-10 3.547416 0.88 0.083 9.340340e-08
avg_logFC
结果中avg_logFC是否是FC的自然对数呢?看看源码吧。
getAnywhere('FindMarkers.default')
# mean.fxn <- if (is.null(x = reduction) && slot != "scale.data") {
# switch(EXPR = slot, data = function(x) {
# return(log(x = rowMeans(x = expm1(x = x)) + pseudocount.use))
# }, function(x) {
# return(log(x = rowMeans(x = x) + pseudocount.use))
# })
# }
# else {
# rowMeans
# }
# data.1 <- mean.fxn(data[features, cells.1, drop = FALSE])
# data.2 <- mean.fxn(data[features, cells.2, drop = FALSE])
# total.diff <- (data.1 - data.2)
logfc.threshold和min.pct
差异倍数及表达比例的参数设置会影响运行时间吗?
system.time(FindMarkers(pbmc, ident.1=1, logfc.threshold=0.25, min.pct=0.1))
#user system elapsed
#21.154 0.435 21.590
system.time(FindMarkers(pbmc, ident.1=1, logfc.threshold=0, min.pct=0))
#user system elapsed
#330.599 1.365 332.409
放宽参数的设置耗时大幅增加,查看源码。min.pct和logfc.threshold都在代码头部用于过滤基因,阈值放宽使得用于分析的基因增多。
马克marker
#B细胞FeaturePlot(object = pbmc_small, features = c('MS4A1', 'CD19', 'CD79B'),ncol=3)