使用bismark进行甲基化分析并使用IGV对结果进行可视化

Bismark是进行甲基化分析过程中常用到的软件，可以将bisulfite处理后的测序reads比对到参考基因组上，得到参考基因组相应位置的甲基化水平。

我的甲基化数据为WGBS所测得的数据，使用Bismark call methylation分为以下几步：

1.处理WGBS测序数据，使用trimmomatic去接头；

2.为参考基因组建立索引；（基因组约450M，这一步用时15min）;

bismark_genome_preparation --bowtie2 yourdir ##yourdir中存放genome

3.比对甲基化数据；（这步用时较久，每个个体大概花费3h)

nohup bismark --bowtie2 -N 0 -L 20 --quiet --un --ambiguous --bam --parallel 30 -o outdir refdir -1 file1 -2 file2 &

4.去冗余；（0.5h）

nohup deduplicate_bismark -p -bam alignment_bam &

对3中比对的bam文件进行去冗余

5.提取甲基化信息;（4h）

nohup bismark_methylation_extractor -p --parallel 40 --comprehensive --no_overlap--bedGraph --cytosine_report --CX_context --split_by_chromosome --counts--buffer_size 20G --report --samtools_path yourpath/smrtlink_v9/smrtcmds/bin --genome_folder refdir bam -o outdir &

##--split_by_chromosome参数可以将结果按照染色体拆分，便于后续可视化。

结果生成多个文件，主要使用每条染色体的甲基化文件，例：

*_bismark_bt2_pe.deduplicated.CX_report.txt.chrChr14.CX_report.txt

文件格式如下：

各列分别为：

第一列：染色体；

第二列：染色体上具体位点（以1开始）；

第三列：正负链；

第四列：比对到该位点发生甲基化的reads数；

第五列：比对到该位点未发生甲基化的read数；

第六列：甲基化类型，有三种：CG, CHG, CHH；

第七列：该位点的三核苷酸序列；

6.IGV可视化

为了使用IGV进行可视化，需要构建bedgraph格式文件，bedgraph格式文件可以在IGV中对continuous-valued 数据进行可视化，适用于转录组数据或概率得分（The bedGraph format allows display of continuous-valued data in track format. This display type is useful for probability scores and transcriptome data. ），因此也适合展示甲基化率。bedgraph文件格式如下：

chromA chromStartA chromEndA dataValueA chromB chromStartB chromEndB dataValueB

第一列：染色体；

第二列：基因/位点在染色体上的起始位置；

第三列：基因/位点在染色体上的结束位置；

第四列：数值；（本次分析中为甲基化率）

bismark_methylation_extractor会生成一个总的bedgraph文件：

*_P_1_bismark_bt2_pe.deduplicated.bedGraph.gz

但该文件是整个基因组所有甲基化C位点，没有分CG/CHG/CHH信息，也不利于后续可视化，因此根据5提到的CX_report.txt（已按照染色体拆分）文件拆分CG,CHG,CHH三类甲基化后转换成bedgraph格式文件。

首先使用bismarkCXmethykit.pl脚本进行转换，转换后出现以CG_methykit.txt，CHG_methykit.txt，CHH_methykit.txt为后缀的拆分甲基化类型的文件，文件格式如下：

再使用R脚本转换成bedgraph文件，只需保留chr列，base列和freqC列，R脚本如下：

Args <- commandArgs(T)

Args[1]

myfile=read.table(Args[1],header=T)

outfile=sub("_P_1_bismark_bt2_pe.deduplicated.CX_report.txt.chrChr19.CX_report.txt", "",Args[1])

file_out=sub("_methykit.txt", ".bedgraph", outfile ) ###输出文件

chr<-as.character(myfile$chr)

pos<-as.integer(myfile$base)

pos2<-as.integer(myfile$base)

freq<-as.numeric(myfile$freqC)

outf<-data.frame(chr,pos,pos2,freq, check.names = F) ###合并以上四列，check.names = F可以防止将字符串转化为其他内容。

outf$chr<-factor(outf$chr)

dim(myfile)

dim(outf) ###检查一下新文件和原文件行数是否相等

write.table(outf, file=file_out, quote=F,col.names = F, row.names = F) ##输出

转换后生成以CG.bedgraph/CHG.bedgraph/CHH.bedgraph为后缀的文件，文件内容如下：

符合IGV需求的格式，可以用于可视化。

##由于有时候整个基因组的bedgraph文件太大，我们只想看某条染色体的情况，可以先使用linux提取某条染色体以及某类C（CG/CHG/CHH）的信息，然后转成bedgraph格式。可以大大降低运行时间。

export PATH=/gpfs/home/fffu/anaconda3/envs/R4.2/bin:$PATH

###提取chrA01的CG信息

for file in ./*R_clean_1_bismark_bt2_pe.deduplicated.CX_report.txt

do

sample=${file##*/}

short=${sample%%R_clean_1_bismark_bt2_pe.deduplicated.CX_report.txt}

outall=$short"CG.report"

outchr1=$short"chr1_CG.report"

awk '$6 == "CG"' $file > $outall

awk '$1 == "chrA01"' $outall > $outchr1

done

### bismarkCXmethykit.pl计算甲基化率

for file in ./*chr1_CG.report

do

perl bismarkCXmethykit.pl $file

done

### bismarkCXmethykit.pl结果转为bedgraph结果

for file in ./*CG_methykit.txt

do

Rscript methykit2bedgraph.R $file

done

###生成的结果可用于IGV可视化。

bismark介绍见官网：https://github.com/FelixKrueger/Bismark/tree/master/Docs

bedgraph介绍：

http://genome.ucsc.edu/goldenPath/help/bedgraph.html

bismarkCXmethykit.pl脚本参考：

https://www.jianshu.com/p/5c27908ff1e3

使用bismark进行甲基化分析并使用IGV对结果进行可视化

推荐阅读更多精彩内容