Diamond是一个用于比对query蛋白和数据库蛋白(blastp)或query核苷酸序列和数据库蛋白(blastx)的软件,官方测试其性能为blast+的500~20000倍,好像很厉害的样子:
- Pairwise alignment of proteins and translated DNA at 500x-20,000x speed of BLAST.
- Frameshift alignments for long read analysis.
- Low resource requirements and suitable for running on standard desktops or laptops.
- Various output formats, including BLAST pairwise, tabular and XML, as well as taxonomic classification.
- 生信技能树上面有讲Diamond和blast+的比较。
- 下面是引自Universität Tübingen的介绍
(DIAMOND is a new alignment tool for aligning short DNA sequencing reads to a protein reference database such as NCBI-NR. On Illumina reads of length 100-150bp, in fast mode, DIAMOND is about 20,000 times faster than BLASTX, while reporting about 80-90% of all matches that BLASTX finds, with an e-value of at most 1e-5. In sensitive mode, DIAMOND ist about 2,500 times faster than BLASTX, finding more than 94% of all matches.)
所以,还是学一学吧
下面是对diamond官方manual的简短介绍:
- 有四个主要的程序:
- makedb
- blastp
- blastx
- view
- -b 默认参数为2,指定使用32G的运行内存,如果计算机RAM充足,可以使用更大的数值
-k 指定返回的比对数目
-e 指定E-value,默认0.001,比blast+的默认值更加严格
-f 输出格式,我比较喜欢6,和blast+一样,可以选择输出哪些fields
-o 输出到哪个文件中
-p 指定使用的核心数目
--min-score 指定blast的score值,该值在指定时-e便会无效
--seg mask的参数,指定时会把query的低复杂度的区段给mask掉
-c 设置chunk的数目,值越大内存消耗和性能都会下降,设置为1时分析最快,不可以设置大于4(默认值)
-q query序列,fasta和fastq格式都可以 - 下面是Diamond在Github上的一点使用说明:
- The program may use quite a lot of memory and also temporary disk space. Should the program fail due to running out of either one, you need to set a lower value for the block size parameter
-b
(see the manual).- The default (fast) mode was mainly designed for short reads. For longer sequences, the sensitive modes (options
--sensitive
or--more-sensitive
) are recommended.- The runtime of the program is not linear in the size of the query file and it is much more efficient for large query files (> 1 million sequences) than for smaller ones.
- Low complexity masking is applied to the query and reference sequences by default. Masked residues appear in the output as X.
- The default e-value cutoff of DIAMOND is 0.001 while that of BLAST is 10, so by default the program will search a lot more stringently than BLAST and not report weak hits.