表观组数据在call-peak的时候选取的方法还是很多的,这里简单记录一下
1 先call-peak后取peak交集
可以使用 IDR统计一致性较好的peak然后bedtools intersect合并peak
idr 安装参考链接
The IDR (Irreproducible Discovery Rate) framework is a unified approach to measure the reproducibility of findings identified from replicate experiments and provide highly stable thresholds based on reproducibility.
例子
echo "idr --samples A${id}K4_peaks.broadPeak C${id}K4_peaks.broadPeak --input-file-type broadPeak --output-file ACK4-${id} --plot --rank p.value ">>ACK4.sh
#得到图片还有一致性peak文件
NC_045731.1 18482515 18484118 . 1000 . -1 261.55000 -1 5.000000 5.000000 18482520 18483972 261.55000 18482515 18484118 677.33400
NW_022587827.1 45181 47485 . 1000 . -1 177.63400 -1 5.000000 5.000000 45181 46596 177.63400 45193 47485 414.37100
NC_045731.1 18515047 18516017 . 1000 . -1 134.81900 -1 5.000000 5.000000 18515068 18515901 134.81900 18515047 18516017 391.59900
一致性较好的peak可以使用bedtools intersect合并
bedtools intersect [OPTIONS] -a <FILE> \
-b <FILE1, FILE2, ..., FILEN>
2先合并bam文件后callpeak
首先对于生物学重复bam使用deeptools的multiBamSummary进行correlations 统计
multiBamSummary computes the read coverages for genomic regions for typically two or more BAM files. The analysis can be performed for the entire genome by running the program in ‘bins’ mode. If you want to count the read coverage for specific regions only, use the BED-file mode instead. The standard output of multiBamSummary is a compressed numpy array (.npz). It can be directly used to calculate and visualize pairwise correlation values between the read coverages using the tool ‘plotCorrelation’. Similarly,
multiBamSummary bins --bamfiles file1.bam file2.bam -o results.npz
##生成的npz文件可以做主成分分析,plotCorrelation分析
plotCorrelation -in x.npz --skipZeros --corMethod pearson --whatToPlot heatmap --colorMap RdYlBu_r --plotNumbers -o x.pdf --outFileCorMatrix x.tab
相关性系数较好的可以进行bam合并
samtools merge [options] -o <out.bam> [options] <in1.bam> ... <inN.bam>
samtools merge [options] <out.bam> <in1.bam> ... <inN.bam>
那种方法好要结合自己的数据测序深度,文库质量而选择,可以先call-peak看看peak数量,idr看看一致性再做决定。