ATAC、CUT&tag生物学重复callpeak笔记

表观组数据在call-peak的时候选取的方法还是很多的,这里简单记录一下

1 先call-peak后取peak交集

可以使用 IDR统计一致性较好的peak然后bedtools intersect合并peak

idr 安装参考链接

The IDR (Irreproducible Discovery Rate) framework is a unified approach to measure the reproducibility of findings identified from replicate experiments and provide highly stable thresholds based on reproducibility.

例子

echo "idr --samples A${id}K4_peaks.broadPeak C${id}K4_peaks.broadPeak --input-file-type broadPeak --output-file ACK4-${id}  --plot --rank p.value  ">>ACK4.sh
#得到图片还有一致性peak文件
iTerm2.VbLrJw.ACK4-69.png
NC_045731.1     18482515        18484118        .       1000    .       -1      261.55000       -1      5.000000        5.000000        18482520        18483972        261.55000       18482515        18484118        677.33400
NW_022587827.1  45181   47485   .       1000    .       -1      177.63400       -1      5.000000        5.000000        45181   46596   177.63400       45193   47485   414.37100
NC_045731.1     18515047        18516017        .       1000    .       -1      134.81900       -1      5.000000        5.000000        18515068        18515901        134.81900       18515047        18516017        391.59900     

一致性较好的peak可以使用bedtools intersect合并

image.png
bedtools intersect [OPTIONS] -a <FILE> \
                             -b <FILE1, FILE2, ..., FILEN>

2先合并bam文件后callpeak

首先对于生物学重复bam使用deeptools的multiBamSummary进行correlations 统计

multiBamSummary computes the read coverages for genomic regions for typically two or more BAM files. The analysis can be performed for the entire genome by running the program in ‘bins’ mode. If you want to count the read coverage for specific regions only, use the BED-file mode instead. The standard output of multiBamSummary is a compressed numpy array (.npz). It can be directly used to calculate and visualize pairwise correlation values between the read coverages using the tool ‘plotCorrelation’. Similarly,

multiBamSummary bins --bamfiles file1.bam file2.bam -o results.npz
##生成的npz文件可以做主成分分析,plotCorrelation分析
plotCorrelation -in x.npz --skipZeros --corMethod pearson --whatToPlot heatmap --colorMap RdYlBu_r --plotNumbers -o x.pdf --outFileCorMatrix x.tab

相关性系数较好的可以进行bam合并

samtools merge [options] -o <out.bam> [options] <in1.bam> ... <inN.bam>
samtools merge [options] <out.bam> <in1.bam> ... <inN.bam>

那种方法好要结合自己的数据测序深度,文库质量而选择,可以先call-peak看看peak数量,idr看看一致性再做决定。

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容