作者:202031107010173何思成
以markdown形式或格式文本形式,记录MultiQC软件安装运行的过程。找到2个以上fastq文件,用multiQC同时对它们的数据质量做出评价
要求:
1)按操作过程分步骤记录;
2)每块有源代码和得到的运行结果截图;
3)将笔记上传在简书或者是腾讯文档;然后将链接填在答案处。或者直接用word编辑【注意排版格式】,然后将word文件以附件形式上传到答案处。
安装conda, 安装python2环境
(base) 202031107010173@xiaoming-HP:~$ cd ~/Biosofts
(base) 202031107010173@xiaoming-HP:~/Biosofts$ conda create --name python2 python=2.7 -c https://mirrors.ustc.edu.cn/anaconda/cloud/bioconda/ -y
启入python环境
(base) 202031107010173@xiaoming-HP:~/Biosofts$ conda activate python2
(python2) 202031107010173@xiaoming-HP:~/Biosofts$
下载multiqc软件
(python2) 202031107010173@xiaoming-HP:~/Biosofts$ conda install multiqc -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda
运行multiqc软件
(python2) 202031107010173@xiaoming-HP:~/Biosofts$ multiqc .
/disk1/202031107010173/anaconda3/envs/python2/lib/python2.7/site-packages/multiqc-1.0.dev0-py2.7.egg/multiqc/utils/config.py:44: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
configs = yaml.load(f)
/disk1/202031107010173/anaconda3/envs/python2/lib/python2.7/site-packages/multiqc-1.0.dev0-py2.7.egg/multiqc/utils/config.py:50: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
sp = yaml.load(f)
[INFO ] multiqc : This is MultiQC v1.0.dev0
[INFO ] multiqc : Template : default
[INFO ] multiqc : Searching '.'
[WARNING] multiqc : No analysis results found. Cleaning up..
[INFO ] multiqc : MultiQC complete
没有运行成功,查看目录文件
(python2) 202031107010173@xiaoming-HP:~/Biosofts$ ll
total 632892
drwxrwxr-x 6 202031107010173 202031107010173 4096 10月 8 13:44 ./
drwxr-xr-x 18 202031107010173 202031107010173 4096 10月 8 13:18 ../
-rw-rw-r-- 1 202031107010173 202031107010173 570853747 5月 14 2021 Anaconda3-2021.05-Linux-x86_64.sh
drwxrwxr-x 2 202031107010173 202031107010173 4096 9月 24 15:27 fastqc/
drwxrwxr-x 8 202031107010173 202031107010173 4096 1月 10 2018 FastQC/
-rw-rw-r-- 1 202031107010173 202031107010173 10254666 1月 16 2020 fastqc_v0.11.7.zip
-rw-rw-r-- 1 202031107010173 202031107010173 2530112 9月 24 11:10 GCF_946151055.1_Q3570_genomic.gff
-rw-rw-r-- 1 202031107010173 202031107010173 1157510 9月 24 04:27 GCF_946183555.1_B129_S48_genomic.gff
-rwxrwxr-x 1 202031107010173 202031107010173 63248534 6月 12 2021 ibm-aspera-connect_4.0.2.38_linux.sh*
drwxrwxr-x 4 202031107010173 202031107010173 4096 10月 4 10:41 SPAdes-3.12.0-Linux/
drwxrwxr-x 5 202031107010173 202031107010173 4096 8月 17 2021 sratoolkit.2.11.1-ubuntu64/
但是由于由于当前目录下没有fastqc文件,所以multiqc找不到分析结果
所以我们需要先去NCBI下载一些fastqc文件
(python2) 202031107010173@xiaoming-HP:~/Biosofts$ wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/946/151/055/GCF_946151055.1_Q3570/GCF_946151055.1_Q3570_genomic.fna.gz
--2022-10-08 13:59:07-- https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/946/151/055/GCF_946151055.1_Q3570/GCF_946151055.1_Q3570_genomic.fna.gz
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 165.112.9.230, 165.112.9.229, 2607:f220:41f:250::229, ...
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|165.112.9.230|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1369696 (1.3M) [application/x-gzip]
Saving to: ‘GCF_946151055.1_Q3570_genomic.fna.gz’
GCF_946151055.1_Q3570_genomic.fna.gz 100%[==============================================================================>] 1.31M 270KB/s in 5.0s
2022-10-08 13:59:14 (270 KB/s) - ‘GCF_946151055.1_Q3570_genomic.fna.gz’ saved [1369696/1369696]
(python2) 202031107010173@xiaoming-HP:~/Biosofts$ wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/946/151/065/GCF_946151065.1_Q6965/GCF_946151065.1_Q6965_genomic.fna.gz
--2022-10-08 14:02:23-- https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/946/151/065/GCF_946151065.1_Q6965/GCF_946151065.1_Q6965_genomic.fna.gz
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.10, 165.112.9.230, 2607:f220:41f:250::230, ...
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 794477 (776K) [application/x-gzip]
Saving to: ‘GCF_946151065.1_Q6965_genomic.fna.gz’
GCF_946151065.1_Q6965_genomic.fna.gz 100%[==============================================================================>] 775.86K 314KB/s in 2.5s
2022-10-08 14:02:26 (314 KB/s) - ‘GCF_946151065.1_Q6965_genomic.fna.gz’ saved [794477/794477]
下载之后解压这两个压缩文件
(python2) 202031107010173@xiaoming-HP:~/Biosofts$ gunzip GCF_946151055.1_Q3570_genomic.fna.gz
(python2) 202031107010173@xiaoming-HP:~/Biosofts$ gunzip GCF_946151065.1_Q6965_genomic.fna.gz
(python2) 202031107010173@xiaoming-HP:~/Biosofts$ ll
total 636652
drwxrwxr-x 6 202031107010173 202031107010173 4096 10月 8 14:07 ./
drwxr-xr-x 18 202031107010173 202031107010173 4096 10月 8 13:18 ../
-rw-rw-r-- 1 202031107010173 202031107010173 570853747 5月 14 2021 Anaconda3-2021.05-Linux-x86_64.sh
drwxrwxr-x 2 202031107010173 202031107010173 4096 9月 24 15:27 fastqc/
drwxrwxr-x 8 202031107010173 202031107010173 4096 1月 10 2018 FastQC/
-rw-rw-r-- 1 202031107010173 202031107010173 10254666 1月 16 2020 fastqc_v0.11.7.zip
-rw-rw-r-- 1 202031107010173 202031107010173 4650900 9月 24 11:10 GCF_946151055.1_Q3570_genomic.fna
-rw-rw-r-- 1 202031107010173 202031107010173 2885939 9月 24 11:10 GCF_946151065.1_Q6965_genomic.fna
-rwxrwxr-x 1 202031107010173 202031107010173 63248534 6月 12 2021 ibm-aspera-connect_4.0.2.38_linux.sh*
drwxrwxr-x 4 202031107010173 202031107010173 4096 10月 4 10:41 SPAdes-3.12.0-Linux/
drwxrwxr-x 5 202031107010173 202031107010173 4096 8月 17 2021 sratoolkit.2.11.1-ubuntu64/
multiqc . 命令输入后发现找不到分析数据
(python2) 202031107010173@xiaoming-HP:~/Biosofts$ multiqc .
/disk1/202031107010173/anaconda3/envs/python2/lib/python2.7/site-packages/multiqc-1.0.dev0-py2.7.egg/multiqc/utils/config.py:44: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
configs = yaml.load(f)
/disk1/202031107010173/anaconda3/envs/python2/lib/python2.7/site-packages/multiqc-1.0.dev0-py2.7.egg/multiqc/utils/config.py:50: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
sp = yaml.load(f)
[INFO ] multiqc : This is MultiQC v1.0.dev0
[INFO ] multiqc : Template : default
[INFO ] multiqc : Searching '.'
[WARNING] multiqc : No analysis results found. Cleaning up..
[INFO ] multiqc : MultiQC complete
查找原因,首先查看multiqc安装情况,出现问题了
(base) 202031107010173@xiaoming-HP:~/Biosofts$ multiqc --version
multiqc: command not found
(base) 202031107010173@xiaoming-HP:~/Biosofts$ multiqc -h
multiqc: command not found
那么重新安装试试
conda install multiqc -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda
过了几天再做这个任务发现在/disk1/shares/Seqs/目录下有现成的fastqc文件,而且之前我下载的存在问题,那么这次将fastqc文件cp到~/Biosofts/下再次multiqc测试
(base) 202031107010173@xiaoming-HP:~/Biosofts$ conda activate python2
(python2) 202031107010173@xiaoming-HP:~/Biosofts$ multiqc .
/disk1/202031107010173/anaconda3/envs/python2/lib/python2.7/site-packages/multiqc-1.0.dev0-py2.7.egg/multiqc/utils/config.py:44: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
configs = yaml.load(f)
/disk1/202031107010173/anaconda3/envs/python2/lib/python2.7/site-packages/multiqc-1.0.dev0-py2.7.egg/multiqc/utils/config.py:50: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
sp = yaml.load(f)
[INFO ] multiqc : This is MultiQC v1.0.dev0
[INFO ] multiqc : Template : default
[INFO ] multiqc : Searching '.'
[ERROR ] multiqc : Oops! The 'prokka' MultiQC module broke...
Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues
(if possible, include a log file that triggers the error)
============================================================
Module prokka raised an exception: Traceback (most recent call last):
File "/disk1/202031107010173/anaconda3/envs/python2/lib/python2.7/site-packages/multiqc-1.0.dev0-py2.7.egg/EGG-INFO/scripts/multiqc", line 346, in multiqc
output = mod()
File "/disk1/202031107010173/anaconda3/envs/python2/lib/python2.7/site-packages/multiqc-1.0.dev0-py2.7.egg/multiqc/modules/prokka/prokka.py", line 31, in __init__
self.parse_prokka(f)
File "/disk1/202031107010173/anaconda3/envs/python2/lib/python2.7/site-packages/multiqc-1.0.dev0-py2.7.egg/multiqc/modules/prokka/prokka.py", line 87, in parse_prokka
first_line = f['f'].readline()
File "/disk1/202031107010173/anaconda3/envs/python2/lib/python2.7/codecs.py", line 314, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd4 in position 3116: invalid continuation byte
============================================================
[WARNING] multiqc : No analysis results found. Cleaning up..
[INFO ] multiqc : MultiQC complete
又失败了o(╥﹏╥)o
再看看是什么原因
我们先用fastqc对文件进行质量测控
(base) 202031107010173@xiaoming-HP:~/Biosofts$ fastqc Akle_TTAGGC_L004_R1_001.fastq
Started analysis of Akle_TTAGGC_L004_R1_001.fastq
Approx 5% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 10% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 15% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 20% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 25% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 30% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 35% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 40% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 45% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 50% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 55% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 60% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 65% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 70% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 75% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 80% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 85% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 90% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 95% complete for Akle_TTAGGC_L004_R1_001.fastq
Approx 100% complete for Akle_TTAGGC_L004_R1_001.fastq
Analysis complete for Akle_TTAGGC_L004_R1_001.fastq
(base) 202031107010173@xiaoming-HP:~/Biosofts$ fastqc Akle_TTAGGC_L004_R2_001.fastq
Started analysis of Akle_TTAGGC_L004_R2_001.fastq
Approx 5% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 10% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 15% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 20% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 25% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 30% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 35% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 40% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 45% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 50% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 55% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 60% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 65% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 70% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 75% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 80% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 85% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 90% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 95% complete for Akle_TTAGGC_L004_R2_001.fastq
Approx 100% complete for Akle_TTAGGC_L004_R2_001.fastq
Analysis complete for Akle_TTAGGC_L004_R2_001.fastq
(base) 202031107010173@xiaoming-HP:~/Biosofts$ conda activate python2
(python2) 202031107010173@xiaoming-HP:~/Biosofts$ multiqc .
/disk1/202031107010173/anaconda3/envs/python2/lib/python2.7/site-packages/multiqc-1.0.dev0-py2.7.egg/multiqc/utils/config.py:44: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
configs = yaml.load(f)
/disk1/202031107010173/anaconda3/envs/python2/lib/python2.7/site-packages/multiqc-1.0.dev0-py2.7.egg/multiqc/utils/config.py:50: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
sp = yaml.load(f)
[INFO ] multiqc : This is MultiQC v1.0.dev0
[INFO ] multiqc : Template : default
[INFO ] multiqc : Searching '.'
[ERROR ] multiqc : Oops! The 'prokka' MultiQC module broke...
Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues
(if possible, include a log file that triggers the error)
============================================================
Module prokka raised an exception: Traceback (most recent call last):
File "/disk1/202031107010173/anaconda3/envs/python2/lib/python2.7/site-packages/multiqc-1.0.dev0-py2.7.egg/EGG-INFO/scripts/multiqc", line 346, in multiqc
output = mod()
File "/disk1/202031107010173/anaconda3/envs/python2/lib/python2.7/site-packages/multiqc-1.0.dev0-py2.7.egg/multiqc/modules/prokka/prokka.py", line 31, in __init__
self.parse_prokka(f)
File "/disk1/202031107010173/anaconda3/envs/python2/lib/python2.7/site-packages/multiqc-1.0.dev0-py2.7.egg/multiqc/modules/prokka/prokka.py", line 87, in parse_prokka
first_line = f['f'].readline()
File "/disk1/202031107010173/anaconda3/envs/python2/lib/python2.7/codecs.py", line 314, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd4 in position 3116: invalid continuation byte
============================================================
[INFO ] fastqc : Found 2 reports
[INFO ] multiqc : Report : multiqc_report.html
[INFO ] multiqc : Data : multiqc_data
[INFO ] multiqc : MultiQC complete
再次查看Biosofts文件夹出现了 multiqc_data/ multiqc_report.html两个文件
然后将multiqc_report.html文件下载到桌面用浏览器打开就可以查看整合情况了
这是具体情况链接
[MultiQC Report](file:///C:/Users/hesicheng/Desktop/multiqc_report.html)
截图如下
image.png
完成