MultiQC软件的安装运行及对数据质量做出评价

一.conda安装multiqc软件

1.安装conda

2.安装python2环境

conda create --name python2 python=2.7 -c https://mirrors.ustc.edu.cn/anaconda/cloud/bioconda/ -y
conda activate python2

multiqc1.PNG

multiqc2.PNG

3.用conda安装multiqc软件

conda install multiqc -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda

multiqc4.PNG

4.查看multiqc是否安装成功

multiqc -h   #在使用multiqc软件前需要先激活Python环境

multiqc 01.PNG

二.获得fastq文件

1. 在NCBI中寻找两条序列(尽量选择较小的序列，便于运行)，利用prefetch下载此序列

prefetch SRR5987926
prefetch SRR5987998

multiqc02.PNG

2.将SRA文件解压为fastq格式

fastq-dump --gzip --split-files SRR5987926
fastq-dump --gzip --split-files SRR5987998

multiqc03.PNG

3.fastqc进行数据质量评价

fastqc SRR5987926_1.fastq.gz SRR5987926_2.fastq.gz SRR5987998_1.fastq.gz SRR5987998_2.fastq.gz

multiqc04.PNG

01.PNG

三、multiqc进行整合

在当前目录下

multiqc .

02.PNG

生成两个文件multiqc_data和multiqc_report.html，将multiqc_report.html download下来并打开该网页

四、结果分析

1.所有样本数据基本情况统计

001.PNG

重复reads的比例（%Dups）、GC含量占总碱基的比例（%GC，比例越小越好）、总测序量（M Seqs，单位：millions）

2.序列的计数

002.PNG

可以查看reads的数量和其百分比。
根据表可知该四条序列的重复序列都较多。

3.每个read各位置碱基的平均测序质量

003.PNG

横坐标——碱基的位置
纵坐标——质量分数=-10log10p（p代表错误率），所以当质量分数为40的时候，p就是0.0001。此时说明测序质量非常好。
绿色区间——质量很好，橙色区间——质量合理，红色区间——质量不好。
由此可知，四个样本的140个碱基的测序质量平均线都在绿色区域内，质量很好。