在上一篇我们介绍了一种去冗余的方法https://www.jianshu.com/p/f638ce6b7c8f
,还有其他的基因组去冗余方法,可以多种工具共同使用,以便达到最优的结果。
这里推荐另外一种给基因组去冗余的方法,使用canu官网推荐的purge_dups进行去冗余。
软件介绍
purge haplotigs and overlaps in an assembly based on read depth
Dependencies
1.zlib
2.minimap2
3.runner (optional)
4.python3 (optional)
软件的安装
1.安装purge_dugs
git clone https://github.com/dfguan/purge_dups.git
cd purge_dups/src && make
2.安装runner
git clone https://github.com/dfguan/runner.git
cd runner && python3 setup.py install --user
软件使用
#! /bin/bash
mkdir Purge_Dups
cd Purge_Dups
##1.数据准备
contig=sc.asm.hic.p_ctg.fa
ln -s ../sc.asm.hic.p_ctg.fa pri_asm.fa
pri_asm=pri_asm.fa
minimap2 -xasm20 -t 10 $pri_asm $hifi| gzip -c - > hifi.paf.gz
~/biosoft/purge_dups-1.2.5/bin/pbcstat hifi.paf.gz #(produces PB.base.cov and PB.stat files)
~/biosoft/purge_dups-1.2.5/bin/calcuts PB.stat > cutoffs 2>calcults.log
~/biosoft/purge_dups-1.2.5/bin/split_fa $pri_asm > $pri_asm.split
minimap2 -x asm5 -DP $pri_asm.split $pri_asm.split | gzip -c - > $pri_asm.split.self.paf.gz
~/biosoft/purge_dups-1.2.5/bin/purge_dups -2 -T cutoffs -c PB.base.cov $pri_asm.split.self.paf.gz > dups.bed 2> purge_dups.log
~/biosoft/purge_dups-1.2.5/bin/get_seqs -e dups.bed $pri_asm
结果文件
compress file:Purge_Dups/purged.fa