基因组注释文件GTF/GFF格式的介绍
GFF 2 -> GTF -> GFF 3 The GTF (General Transfer Format) is identical to GFF version 2
GTF其实就是GFF版本2
其格式为(每个数字代表一列,总共9列)
1. seqname- (染色体名称) name of the chromosome or scaffold; chromosome names can be given with or without the 'chr' prefix.
2. source- (用什么软件产生的)name of the program that generated this feature, or the data source (database or project name)
3. feature- (是转录本/外显子/内含子 等)feature type name, e.g. Gene, Variation, Similarity
4. start-(起始点) Start position of the feature, with sequence numbering starting at 1.
5. end- (终止点)End position of the feature, with sequence numbering starting at 1.
6. score- A floating point value.
7. strand (正链还是负链)- defined as + (forward) or - (reverse).
8. frame- One of '0', '1' or '2'. '0' indicates that the first base of the feature is the first base of a codon, '1' that the second base is the first base of a codon, and so on..
9. attribute- (特性,比如编码的蛋白 等)A semicolon-separated list of tag-value pairs, providing additional information about each feature.
举例:
transcribed_pseudogene ------> gene ------> 11869 ------> 14409 ------> .------> + ------>. ------> gene_id "ENSG00000223972"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene";
参考: