这个拼接软件版本比较旧,但是相对于SOAPdenovo来说比较方便。
序列的拼接与组装是基因组测序数据处理中一个至关重要的步骤,对于高通量测序的海量短序列,拼接与组装显得尤为重要。
据不完全统计拼接软件至少有十几种左右,其中大家比较熟知的有SOAPdenovo、idba_ud、ABYSS、Velvet、等,每个软件都有自己的优点和不足。
拼接的思路大体可以分为两种:一种叫做Overgap,一种是debrujin;
前者是根据两条read序列前后部分的重叠来拼接,适用于一代测序的结果,而后者是将reads切割成更小的片段k-mers,k-mers的组合来完成拼接工作,适用于二代高通量测序。idba_ud是一个可以针对不同测序深度的短reads的基于交互式De Bruijin作图的从头拼接软件。他从小的k-mer开始到大的的k-mer进行迭代计算,设定阈值,短的和低深度的contigs被删掉,这样来完成低深度和高深度的拼接。
下载新版本转移至:
https://github.com/loneknightpy/idba
参考文献
Peng, Y., et al. (2010) IDBA- A Practical Iterative de Bruijn Graph De Novo Assembler. RECOMB. Lisbon.
Peng, Y., et al. (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth, Bioinformatics, 28, 1420-1428.
安装步骤如下:
1. git clone https://github.com/loneknightpy/idba.git
进去这个文件夹
2. $ ./build.sh
3. $ ./configure --perfix=绝对路径(因为没有权限)
4. $ make
5. 将这个路径写到环境变量下
export PATH=绝对路径:$PATH
可直接使用。
Comments
Note that IDBA assemblers are designed for short reads (around 100bp). If you want to assemble paired-end reads with longer read length, please modify the constant kMaxShortSequence in src/sequence/short_sequence.h to support longer read length.
Please find the manual by running the assembler without any parameters. For example:
$ bin/idba
IDBA series assemblers accept FASTA format reads. FASTQ format reads can be converted by fq2fa program in the package.
$ bin/fq2fa read.fq read.fa
IDBA-UD, IDBA-Hybrid and IDBA-Tran require paired-end reads stored in the same FASTA file. A pair of reads should be in two consecutive lines. If not, please use fq2fa to merge two FASTQ read files to single file.
$ bin/fq2fa --merge --filter read_1.fq read_2.fq read.fa
or convert a FASTQ read file to FASTA file.
$ bin/fq2fa --paired --filter read.fq read.fa
This tool assumes that the paired-end reads are in order (->, <-). If your data is in reverse order (<-, ->), please convert it by yourself.
小技巧:idba_ud默认最长只支持reads长度为128的序列,
如果你的reads序列大于128,可修改 src/sequence/short_sequence.h文件中的kMaxShortSequence值
我的read最大150,将这个值改成了150,但是改完之后要重新编译安装一遍,用make clean清楚之前装的编译环境。