比对后bam文件质控
使用之前搭建的WES分析小环境:
只保留软件及参考基因组数据注释相关文件
通过Filezilla上传所有原始比对后的bam文件。第一步质控:
参考 :https://www.yuque.com/biotrainee/wes/ul563z
第一步构建config文件:
basename -a ~/CHD_pooling_seq/*.bam >file_basename
cat file_basename | while read id ; do sample=${id%%.dedup.*}; echo $sample; done > config
cat config
2种方法进行质控:
samtools stats
samtools stats 结果可视化plot-bamstats需要先安装gnuplot
进入project目录,激活wes小环境:
conda install -c bioconda gnuplot
安装完成后,开始质控:
## 1.samtools stats 结果可视化
cat config | while read id
do
bam=~/CHD_pooling_seq/${id}.dedup.bam
samtools stats -@ 16 --reference ~/reference/genome/Homo_sapiens_assembly38.fasta ${bam} > ./1.qc/stats/${id}.stat
plot-bamstats -p ./1.qc/stats/${id} ./1.qc/stats/${id}.stat
done
生成的结果有png格式的图表,还有html文档:
image.png
qualimap
用qualimap
软件来查看基因组覆盖深度等信息,先安装软件:
conda install -c bioconda qualimap
进行质控:
cat config | while read id
do
qualimap bamqc --java-mem-size=10G -gff ~/annotation/variation/GATK/hg38.exon.bed -nr 100000 -nw 500 -nt 16 -bam ~/CHD_pooling_seq/${id}.dedup.bam -outdir ./1.qc/clean_qc/${id}
done
运行结果如下所示:
(wes) root@1100150:~/project# cat config | while read id
> do
> qualimap bamqc --java-mem-size=10G -gff ~/annotation/variation/GATK/hg38.exon.bed -nr 100000 -nw 500 -nt 16 -bam ~/CHD_pooling_seq/${id}.dedup.bam -outdir ./1.qc/clean_qc/${id}
> done
Display:
Java memory size is set to 10G
Launching application...
QualiMap v.2.2.2-dev
Built on 2016-12-11 14:41
Selected tool: bamqc
Available memory (Mb): 32
Max memory (Mb): 9544
Starting bam qc....
WARNING: BAM index file /root/CHD_pooling_seq/C1.dedup.bam.bai is older than BAM /root/CHD_pooling_seq/C1.dedup.bam
Loading sam header...
Mon Jul 13 07:10:53 UTC 2020 WARNING @HD line is not presented in the BAM file header.
Loading locator...
Loading reference...
Number of windows: 500, effective number of windows: 592
Chunk of reads size: 100000
Number of threads: 16
Initializing regions from /root/annotation/variation/GATK/hg38.exon.bed.....
Found 199208 regions
Filling region references...
Processed 59 out of 592 windows...
Processed 118 out of 592 windows...
Processed 177 out of 592 windows...
Processed 236 out of 592 windows...
Processed 295 out of 592 windows...
Processed 354 out of 592 windows...
Processed 413 out of 592 windows...
Processed 472 out of 592 windows...
Processed 531 out of 592 windows...
Processed 590 out of 592 windows...
Total processed windows:592
Number of reads: 13282037
Number of valid reads: 12941766
Number of correct strand reads:0
Mon Jul 13 07:13:59 UTC 2020 WARNING SAMRecordParser marked 21 problematic reads.
Inside of regions...
Num mapped reads: 224687
Num mapped first of pair: 112411
Num mapped second of pair: 112276
Num singletons: 593
Time taken to analyze reads: 186
Computing descriptors...
numberOfMappedBases: 12585877
referenceSize: 3137161264
numberOfSequencedBases: 12582492
numberOfAs: 3426917
Computing per chromosome statistics...
Computing histograms...
Overall analysis time: 187
end of bam qc
Computing report...
Writing HTML report...
HTML report created successfully
Finished
Display:
Java memory size is set to 10G
Launching application...
QualiMap v.2.2.2-dev
Built on 2016-12-11 14:41
...