Fastq 格式介绍

FASTQ read from the NCBI SRA

There are four line types in the FASTQ format. First a ‘@’ title line which often holds just a record identifier. This is a free format field with no length limit—allowing arbitrary annotation or comments to be included, as in the example above where the NCBI have included an alternative ID and the sequence length. Some sequencing centers encode paired end read information here (alternatively two matched FASTQ files are often used).

Second comes the sequence line(s), which as in the FASTA format can be line wrapped. Also like FASTA format, there is no explicit limitation on the characters expected, but restriction to the IUPAC single letter codes for (ambiguous) DNA or RNA is wise, and upper case is conventional. In some contexts, the use of lower or mixed case or the inclusion of a gap character may make sense. White space such as tabs or spaces is not permitted.

Third, to signal the end of the sequence lines and the start of the quality string, comes the ‘+’ line. Originally this also included a full repeat of the title line text (as shown in the NCBI example above); however, by common usage and the MAQ tool convention, this is optional and the ‘+’ line can contain just this one character, reducing the file size significantly. The OBF tools follow this MAQ convention on output, and omit the optional repeated title text.

Finally, comes quality line(s) which again can be wrapped. As discussed above, these use a subset of the ASCII printable characters (at most ASCII 33–126 inclusive) with a simple offset mapping. Crucially, after concatenation (removing line breaks), the quality string must be equal in length to the sequence string.

参考文献

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.Nucleic Acids Research, Volume 38, Issue 6, April 2010, Pages 1767–1771,https://doi.org/10.1093/nar/gkp1137

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • mean to add the formatted="false" attribute?.[ 46% 47325/...
    ProZoom阅读 2,719评论 0 3
  • Introduction What is Bowtie 2? Bowtie 2 is an ultrafast a...
    wzz阅读 5,787评论 0 5
  • pyspark.sql模块 模块上下文 Spark SQL和DataFrames的重要类: pyspark.sql...
    mpro阅读 9,504评论 0 13
  • 哇_(´□`」 ∠)_我又能怎么办呢?谁叫我喜欢你呢?自己找的女朋友,柏拉图就只能柏拉图啊_(´□`」 ∠)_
  • “三月里的小雨,淅沥沥沥沥沥 淅沥沥下个不停。” “山谷里的小溪,哗啦啦啦啦啦 哗啦啦流个不停。”...
    握紧的沙呀阅读 432评论 2 5