CUT&Tag 数据分析(2)

5. 文件格式转换

-bedpe 要求按read名排序把成对的排序到一起,warning可能是因为没排序,并不一定是数据本身的问题。

for i in `ls *_sortname.bam`;
do
    i=${i/_sortname.bam/};
    printf "prepare $i"
    ## Convert into bed file format
    bedtools bamtobed -i ${i}_sortname.bam -bedpe > /mnt/g/linux/20231129cuttag/bam/bed1/${i}_bowtie2.bed \
    ##Keep the read pairs that are on the same chromosome and fragment length less than 1000bp
    awk '$1==$4 && $6-$2 < 1000 {print $0}' /mnt/g/linux/20231129cuttag/bam/bed1/${i}_bowtie2.bed > /mnt/g/linux/20231129cuttag/bam/bed1/${i}_bowtie2.clean.bed \
    ## Only extract the fragment related columns
    cut -f 1,2,6 /mnt/g/linux/20231129cuttag/bam/bed1/${i}_bowtie2.clean.bed | sort -k1,1 -k2,2n -k3,3n  > /mnt/g/linux/20231129cuttag/bam/bed1/${i}_bowtie2.fragments.bed
    printf "finish $i";
done

6. 样本重复性

$samtools index ${i}_sortname.bam
$multiBamSummary bins --bamfiles *_sortname.bam --labels sample1 sample2 sample3  -out readCounts.npz --outRawCounts readCounts.tab
$plotCorrelation  -in readCounts.npz  -o results_heatmap.png  --corMethod spearman  -p heatmap

转录因子看组间差异和重复性有点鸡肋,没有组蛋白那么大的差别。我第一次做CUT&Tag没有spike-in校准,后续代码也就有些许差别。

7. CUT&Tag分析需要bedgraph格式

##基因组染色体大小的文件利用samtools进行提取
samtools faidx GRCh38.primary_assembly.genome.fa

该文件包含关于索引的 FASTA 文件中的序列名称、长度和字节偏移的信息,相对于实际序列数据而言是相当紧凑的。

for i in `ls *_bowtie2.fragments.bed`;
do
    i=${i/_bowtie2.fragments.bed/};
    printf "prepare $i"
    bedtools genomecov -bg -i ${i}_bowtie2.fragments.bed \
    -g /mnt/g/linux/chip/gencode/GRCh38.primary_assembly.genome.fa.fai > /mnt/g/linux/20231129cuttag/bam/bed1/bedgraph/${i}.fragments.bedgraph
    printf "finish $i";
done

seacr中包含对输入格式及转换的具体代码

Here is some example code for converting from a paired-end BAM to a fragment bedgraph file as described above:

bedtools bamtobed -bedpe -i $sample.bam > $sample.bed
awk '$1==$4 && $6-$2 < 1000 {print $0}' $sample.bed > $sample.clean.bed
cut -f 1,2,6 $sample.clean.bed | sort -k1,1 -k2,2n -k3,3n > $sample.fragments.bed
bedtools genomecov -bg -i $sample.fragments.bed -g my.genome > $sample.fragments.bedgraph

其中也包含对一些具体参数的描述,一定要看README.md

## Description of input fields:

Field 1: Target data bedgraph file in UCSC bedgraph format (https://genome.ucsc.edu/goldenpath/help/bedgraph.html) that omits regions containing 0 signal.

Field 2: Control (IgG) data bedgraph file to generate an empirical threshold for peak calling. Alternatively, a numeric threshold *n* between 0 and 1 returns the top *n* fraction of peaks based on total signal within peaks. 

Field 3: “norm” denotes normalization of control to target data, “non” skips this behavior. "norm" is recommended unless experimental and control data are already rigorously normalized to each other (e.g. via spike-in).

Field 4: “relaxed” uses a total signal threshold between the knee and peak of the total signal curve, and corresponds to the “relaxed” mode described in the text, whereas “stringent” uses the peak of the curve, and corresponds to “stringent” mode.

Field 5: Output prefix

没有spike-in,一定选norm.

seacr="/mnt/g/linux/SEACR-master/SEACR_1.3.sh"
for i in `ls *.fragments.bedgraph`;
do
    i=${i/.fragments.bedgraph/};
    printf "prepare $i"
    #Calls enriched regions in target data using normalized IgG control track with stringent threshold
    bash $seacr ${i}.fragments.bedgraph IgG.fragments.bedgraph norm stringent /mnt/g/linux/20231129cuttag/bam/bed1/bedgraph/${i}.seacr_control.peaks
    printf "finish $i";
done

homer验证或者寻找motif

perl /mnt/e/linux/configureHomer.pl -list#基因组hg38有无,无需要下载
findMotifsGenome.pl /mnt/g/linux/20231129cuttag/bam/bed1/bedgraph/sample-1.seacr_co
ntrol.peaks.stringent.bed /mnt/g/linux/data/genomes/hg38 /mnt/g/linux/20231129cuttag/homer/sample-1_motif -len 8,10,
12

7-2. bam文件合并及motif查找
生物学重复样本可以先进行bam文件合并

samtools merge -o merge/KD_sortname.bam KD1_sortname.bam KD2_sortname.bam
macs2 callpeak -t WT_sortname.bam -c KD_sortname.bam -g hs -f BAMPE -n kd_wt --keep-dup=all --outdir callpeak
#BED格式,包含peak summits(peak最高点)位置;如果想寻找结合位点#的motifs ,建议使用此文件。
findMotifsGenome.pl WT_kd_summits.bed /mnt/e/linux/data/genomes/hg38 WT_kd_motif -len 8,10,12
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容