最近在做某类细胞亚群细分时特定群的组间差异peaks时,发现了以下几个方面的问题:
以下是参考脚本
getMarkerFeatures
getMarkers
name<-'CD4 naive'
useGroups<-paste0(name,',','G1')
bgdGroups<-paste0(name,',','G2')
markersPeaks<- getMarkerFeatures(
ArchRProj = projHeme,
useMatrix = "GeneScoreMatrix",
groupBy = groupBy,
testMethod = "wilcoxon",
bias = c("TSSEnrichment", "log10(nFrags)"),
useGroups = useGroups,
bgdGroups = bgdGroups)
cutoff<-"FDR <= 0.1 & (Log2FC >= 0.5 | Log2FC <= -0.5)"
peaksList <- getMarkers(markersPeaks, cutOff = cutoff)
all.peaks<-as.data.frame(peaksList)
dim(all.peaks)
1)默认参数定义的diffpeaks数目过少?
猜测是因为我们一般做test的时候是针对所有的union peaks(一般都是十几万个peaks)定义的,用的都是矫正后的FDR值,当subset cells的组间差异比较小时,用FDR会过于严格(过矫正),因此这时候用原始pvalue较好。
cutoff<-"Pval <= 0.05 & (Log2FC >= 0.5 | Log2FC <= -0.5)"
2)用原始pvalue定义的top 差异peaks结果不太可靠或者plotBrowserTrack图检测不明显?
这是因为我们得到的差异peaks虽然P值显著,差异倍数也比较大,但是为什么可视化图不明显呢?
这个主要还是因为这个peaks在我们subset的细胞群里覆盖度不够,因此为了解决这个问题,我们可以效仿scRNA-Seurat软件的做法,定义出pct.1和pct.2两个变量。
参考脚本如下:
PeakMatrix<-getMatrixFromProject(
ArchRProj = projHeme,
useMatrix = 'PeakMatrix',
useSeqnames = NULL,
verbose = TRUE,
binarize = TRUE,
threads = getArchRThreads(),
logFile = createLogFile("getMatrixFromProject"))
Peak_data<-PeakMatrix@assays@data@listData$PeakMatrix
cellID<-rownames(subset(projHeme@cellColData,Tname ==Name))
cellID1<-rownames(subset(projHeme@cellColData,Tname.g ==Name.G1))
cellID2<-rownames(subset(projHeme@cellColData,Tname.g ==Name.G2))
subPeak_data<-data.frame(Peak_data[,cellID])
subPeak_data1<-data.frame(Peak_data[,cellID1])
subPeak_data2<-data.frame(Peak_data[,cellID2])
n<-nrow(subPeak_data)
n1<-ncol(subPeak_data1);n2<-ncol(subPeak_data2)
ratio1<-rowSums(subPeak_data1)/n1
ratio2<-rowSums(subPeak_data2)/n2
cutoff<-"Pval <=1 & (Log2FC >= 0 | Log2FC <= 0)"
3)解决办法?
用cutoff<-"Pval <=1 & (Log2FC >= 0 | Log2FC <= 0)"得出所有union peaks的差异分析结果,然后用以下参数分别筛选出上调和下调的peaks。
up.peak<-subset(dataset,pct.1>=0.05 & Log2FC>= 0.25 & Pval<0.05)
down.peak<-subset(dataset,pct.2>=0.05 & Log2FC<= (-0.25) & Pval<0.05)
参考资料
官方教程:https://www.archrproject.com/bookdown/creating-arrow-files.html
帮助文档:https://www.archrproject.com/reference/index.html (函数参数说明)
github网址:https://github.com/GreenleafLab/ArchR/
中文教程:https://www.jianshu.com/p/793a7eab67c2 (简书翻译版本)
文章code: https://github.com/GreenleafLab/ArchR_2020
ArchR发表文章:Granja, J. M., et al. (2021). "ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis." Nat Genet 53(3): 403-411.https://www.nature.com/articles/s41588-021-00790-6