安装软件

这里要用到fastqc和python，先检查自己有没有

fastqc -h

是有安装的

如果没有就要按照教程进行安装，如下：

## fastqc使用多线程要用--threads

wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.8.zip
unizp fastqc_v0.11.8.zip
ln -s ~/tools/FastQC/fastqc ~/tools/bin/fastqc

mkdir -p 2.qc/{pre, post}
nohup fastqc --outdir ../2.qc/pre --threads 16 *.fastq.gz > ../2.qc/pre/fastqc.log 2>&1 &


## 需要先安装python
## sudo apt install libffi-dev
cd
cd tools/
mkdir python
wget https://www.python.org/ftp/python/3.7.3/Python-3.7.3.tgz
tar zxvf Python-3.7.3.tgz
cd Python-3.7.3
./configure --prefix="/home/llwu/tools/python/"
make
make install
cd ../python/bin
ls * | while read id
do
ln -s ~/tools/python/bin/${id} ~/tools/bin/${id}
done

这里的链接可能也是过时的，需要去找新的进行下载
我检查了一下我的python的版本是

(/home/jiarongf/my-envs/wes) 10:11:01 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/0.sra
$
python --version
Python 2.7.15

不知道后面是否会有影响，结果发现自己的安装实在home下的环境，所以赶紧去更改了一下，具体可以看我发布的一篇更改虚拟环境位置的文章，
更改之后就python就是3.8的了

(/home/jiarongf/my-envs/wes) 10:36:29 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/0.sra
$
python --version
Python 3.8.8

更改了之后记得重新进行wes的环境

source activate wes
(wes) 11:18:00 jiarongf@172.16.10.223:/data1/jiarongf

还会用到MultiQC,进行安装,也是直接安装在run文件夹下
下面是安装的脚本，可以分开运行

curl -LOk https://github.com/ewels/MultiQC/archive/master.zip
unzip master.zip
cd MultiQC-master
python setup.py install

下载了之后才发现这个可以用pip或者conda安装

Installation

You can install MultiQC from PyPI
using pip as follows:

bash

pip install multiqc

Alternatively, you can install using Conda
from the bioconda channel:

conda install -c bioconda multiqc

If you would like the development version instead, the command is:

pip install --upgrade --force-reinstall git+https://github.com/ewels/MultiQC.git

但是此处我已经下载了，所以直接安装就好了

python setup.py install

安装完成

检测安装是否成功

multiqc  -h

image.png

去接头前质控

首先需要把之前生成的fastq的文件放到一个目录下，定义为raw_data

image.png

(wes) 15:48:37 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
vim pre_qc_report.sh
(wes) 15:49:31 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
sh pre_qc_report.sh

  /// MultiQC 🔍 | v1.13.dev0

|           multiqc | Report title:  QC REPORT OF SRP070662
|           multiqc | Search path : /data1/jiarongf/learning/cancer-WES/0.sra/raw_data
|         searching | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 60/60
|           multiqc | No analysis results found. Cleaning up..
|           multiqc | MultiQC complete
(wes) 15:49:51 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
cat pre_qc_report.sh
multiqc ../0.sra/raw_data/ -n pre_qc_report -p -i " QC REPORT OF SRP070662" -o multiqc
(wes) 15:52:06 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
sh pre_qc_report.sh > ../log/pre_qc_report.log 2>&1

但是感觉没有输出结果
这里突然感觉应该先fastqc，然后再去multi，找到一个好的教程就是从下载sra数据，解压为fastq，在做fastqc在multi，然后合成一个报告，还有报告结果的解读都很不错
MultiQC使用 | 码农家园 (codenong.com)

fastqc

浅试一个

(wes) 15:58:02 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
fastqc /data1/jiarongf/learning/cancer-WES/0.sra/raw_data/case3_techrep_2_WES_2.fastq.gz
Started analysis of case3_techrep_2_WES_2.fastq.gz
Approx 5% complete for case3_techrep_2_WES_2.fastq.gz
Approx 10% complete for case3_techrep_2_WES_2.fastq.gz
Approx 15% complete for case3_techrep_2_WES_2.fastq.gz
Approx 20% complete for case3_techrep_2_WES_2.fastq.gz
Approx 25% complete for case3_techrep_2_WES_2.fastq.gz
Approx 30% complete for case3_techrep_2_WES_2.fastq.gz
Approx 35% complete for case3_techrep_2_WES_2.fastq.gz
Approx 40% complete for case3_techrep_2_WES_2.fastq.gz
Approx 45% complete for case3_techrep_2_WES_2.fastq.gz
Approx 50% complete for case3_techrep_2_WES_2.fastq.gz
Approx 55% complete for case3_techrep_2_WES_2.fastq.gz
Approx 60% complete for case3_techrep_2_WES_2.fastq.gz
Approx 65% complete for case3_techrep_2_WES_2.fastq.gz
Approx 70% complete for case3_techrep_2_WES_2.fastq.gz
Approx 75% complete for case3_techrep_2_WES_2.fastq.gz
Approx 80% complete for case3_techrep_2_WES_2.fastq.gz
Approx 85% complete for case3_techrep_2_WES_2.fastq.gz
Approx 90% complete for case3_techrep_2_WES_2.fastq.gz
Approx 95% complete for case3_techrep_2_WES_2.fastq.gz
Analysis complete for case3_techrep_2_WES_2.fastq.gz
(wes) 16:05:53 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$

时间太久了，建议后台跑，再有就是这个成功的，
跑完以后再原来的目录生成了

image.png

这只是一个的，所以需要循环一下，run目录下写个脚本
前面的脚本pre_qc_report.sh也需要改个名字

(wes) 16:08:26 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
mv pre_qc_report.sh pre_qc_multiqc.sh

上述做错了，重新搞

去接头前质控

(wes) 16:33:05 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES
$
cd /data1/jiarongf/learning/cancer-WES
(wes) 16:31:37 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES
$
mkdir -p 2.qc/{pre,post}
(wes) 16:33:01 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES
$
l
0.sra/  2.qc/  data/  log/  run/
(wes) 16:33:02 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES
$
ls 2.qc/
post  pre

质控

nohup fastqc --outdir ../2.qc/pre --threads 16 *.fastq.gz > ../2.qc/pre/fastqc.log 2>&1 &

(wes) 16:35:55 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
vim pre_fastqc.sh

(wes) 16:41:22 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
cat pre_fastqc.sh
nohup fastqc --outdir ../2.qc/pre --threads 16 ../0.sra/raw_data/*.fastq.gz > ../log/pre.fastqc.log 2>&1 &
(wes) 16:41:09 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
sh pre_fastqc.sh
(wes) 16:41:30 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$

直接挂到后台去，

image.png

一共有60个，

image.png

16个16个的跑，感觉也要跑很久，可以先做去接头

image.png

跑完啦

/data1/jiarongf/learning/cancer-WES/2.qc/pre

image.png

可以multi了

10:24:45 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
vim pre_qc_multiqc.sh
10:26:37 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
cat pre_qc_multiqc.sh
multiqc /data1/jiarongf/learning/cancer-WES/2.qc/pre  -n pre_qc_report -p -i " QC REPORT OF SRP070662" -o /data1/jiarongf/learning/cancer-WES/2.qc/pre/
10:26:42 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
sh pre_qc_multiqc.sh > ../log/pre_qc_multiqc.sh.log 2>&1
10:28:29 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$