三代比对软件-ngmlr

软件名:ngmlr

版本号:ngmlr 0.2.6

1. 软件用途综述

NextGenMap-LR(ngmlr)主要用于三代测序的长reads(PacBio 、Oxford Nanopore)与参考基因组的比对。三代测序产生的reads主要特征有两个:1,读长很长(平均10K);2,高错误率(10% to 15% for PacBio, and 5% to 20% for Oxford Nanopore sequencing)。NGMLR(https://github.com/philres/ngmlr)是一款为长reads设计的快速且高精度的进行比对的软件,它是基于NGM(seed-and-extend short read aligner)开发的,该软件扩展了segmented convex gap-cost scoring model来适应高错误率的长reads比对。

网址:https://github.com/philres/ngmlr

2. 分析原理

image
  1. Identifyinitial anchors
  2. Verifyanchors with vectorized Smith-Waterman algorithm (scores only)
  3. Filteranchors and find candidate regions for the alignments
  4. Compute thefull alignment between the read and the respective candidate reference regions

3. 实现方法

3.1 使用示例

  ngmlr -r ucsc.hg19.fasta -q XXX.fastq -o YYY.bam

3.2 程序说明

-r 参考基因组

-q待比对三代测序数据

-o 输出文件

3.3软件参数说明

Usage: ngmlr [options] -r

<reference> -q <reads> [-o <output>]

Input/Output:

  • -r , --reference *(所在路径需要可写权限)

  •    (required)  Path to the reference genome (FASTA/Q, can begzipped)*
    
  • -q , --query*

  •    Path to the read file(FASTA/Q) [/dev/stdin]*
    
  • -o , --output*

  •    Path to output file[stdout]*
    
  • --skip-write*

  •    Don't write referenceindex to disk [false]*
    
  • --bam-fix*

  •    Report reads with > 64kCIGAR operations as unmapped. Required to be compatibel to BAM format [false]*
    

General:

  • -t , --threads*

  •    Number of threads [1]*
    
  • -x , --presets*

  •    Parameter presets fordifferent sequencing technologies [pacbio]*
    
  • -i <0-1>, --min-identity <0-1>*

  •    Alignments with anidentity lower than this threshold will be discarded [0.65]*
    
  • -R , --min-residues*

  •    Alignments containingless than or ( * read length) residues will bediscarded [0.25]*
    
  • --no-smallinv*

  •    Don't detect smallinversions [false]*
    
  • --no-lowqualitysplit*

  •    Split alignments withpoor quality [false]*
    
  • --verbose*

  •    Debug output [false]*
    
  • --no-progress*

  •    Don't print progress infowhile mapping [false]*
    

Advanced:

  • --match*

  •    Match score [2]*
    
  • --mismatch*

  •    Mismatch score [-5]*
    
  • --gap-open*

  •    Gap open score [-5]*
    
  • --gap-extend-max*

  •    Gap open extend max [-5]*
    
  • --gap-extend-min*

  •    Gap open extend min [-1]*
    
  • --gap-decay*

  •    Gap extend decay [0.15]*
    
  • -k <10-15>, --kmer-length <10-15>*

  •    K-mer length in bases[13]*
    
  • --kmer-skip*

  •    Number of k-mers to skipwhen building the lookup table from the reference [2]*
    
  • --bin-size*

  •    Sets the size of the gridused during candidate search [4]*
    
  • --max-segments*

  •    Max number of segmentsallowed for a read per kb [1]*
    
  • --subread-length*

  •    Length of fragments readsare split into [256]*
    
  • --subread-corridor*

  •    Length of corridorsub-reads are aligned with [40]*
    

3.4 结果展示及说明

结果以Sam格式展示:


image.png

4.资源消耗

image.png

5.注意事项

1, 该软件会在参考基因组所在目录下建一个索引,所以参考基因组所在目录需要有可写权限(也可使用--skip-write参数,明确不将index写入磁盘);
2, 为参考基因组建立index会耗用很长时间,建议在database(参考基因组所在文件夹)中建立一套index,每次调用。

6.软件相关文献引用

Accurate detection of complex structural variations using single molecule sequencing

FritzJ Sedlazeck, Philipp Rescheneder, Moritz Smolka, Han Fang, Maria Nattestad, ArndtvonHaeseler, Michael Schatz.bioRxiv169557; doi: https://doi.org/10.1101/169557

7. FAQs

Poster & Talks:

Accurate and fast detection of complex and nested structural variations using long read technologies Biological Data Science, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 26 - 29.10.2016

NGMLR: Highly accurate read mapping of third generationsequencing reads for improved structural variation analysis Genome Informatics 2016, Wellcome Genome Campus Conference Centre, Hinxton, Cambridge, UK,19.09.-2.09.2016

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • Introduction What is Bowtie 2? Bowtie 2 is an ultrafast a...
    wzz阅读 11,168评论 0 5
  • Unit 1 Can you play the guitar? 2b: Reads the ads. Match ...
    jane2001阅读 6,948评论 0 1
  • pyspark.sql模块 模块上下文 Spark SQL和DataFrames的重要类: pyspark.sql...
    mpro阅读 13,143评论 0 13
  • 有时候在不经意间会回忆起自己的高中时代,特别是高三。身处在高三的人会觉得高三是深渊大学就像是天堂,可是现在身处“天...
    樱洛阅读 1,150评论 0 0
  • 中午的时候成绩公布出来了。我一看决赛名单里没有我的名字。既失望又失落。虽然这个结果是我意料之中的。但是又不是我想要...
    我的心里每天绽放一朵花阅读 876评论 0 0