之前一直用fastqc + trimmomatic 数据质控,基于fastp快速方便等特点,研究转为fastp做质控。
trimmomatic:我们使用的参数(以MALBAC_1_step_lib为例)
trimmomatic的具体参数用法:
https://www.jianshu.com/p/a8935adebaae
高通量测序常见的接头序列
https://github.com/csf-ngs/fastqc/blob/master/Contaminants/contaminant_list.txt
ILLUMINACLIP="ILLUMINACLIP:" + ADAPTERS + ":2:20:6"
SLIDINGWINDOW= SLIDINGWINDOW:4:15
LEADING= "LEADING:3",
TRAILING= "TRAILING:3",
MINLEN= "MINLEN:25",
CROP= 60
HEAD_CROP=12
fastp 具体的参数用法:
https://www.jianshu.com/p/6f492058da5b
https://github.com/OpenGene/fastp#base-correction-for-pe-data
http://www.biotrainee.com/thread-2540-1-1.html
fastp 对应trimmomatic 的参数如下:
1.先做接头切除:用默认值
2.做滑窗处理:窗口大小:4;每个窗口平均碱基质量值:15
--cut_window_size 4
--cut_mean_quality 15
3.根据碱基质量切
---cut_front/-5 3
--cut_front/-3 3
4.长度过滤:丢掉长度不够的reads
--length_required 25
5.做全局剪切
reads开头切掉的碱基数
read1: -f 12
read2: -F 12
从reads尾部开始切,使其达到指定长度
read1: -b/--max_len1 60
read2: -B, --max_len2 60