基因组注释文件(GTF/GFF)格式介绍

基因组注释文件GTF/GFF格式的介绍

GFF 2 -> GTF -> GFF 3                The GTF (General Transfer Format) is identical to GFF version 2 

GTF其实就是GFF版本2

其格式为(每个数字代表一列,总共9列)

1. seqname- (染色体名称)  name of the chromosome or scaffold; chromosome names can be given with or without the 'chr' prefix.

2. source- (用什么软件产生的)name of the program that generated this feature, or the data source (database or project name)

3. feature- (是转录本/外显子/内含子 等)feature type name, e.g. Gene, Variation, Similarity

4. start-(起始点) Start position of the feature, with sequence numbering starting at 1.

5. end- (终止点)End position of the feature, with sequence numbering starting at 1.

6. score- A floating point value.

7. strand (正链还是负链)- defined as + (forward) or - (reverse).

8. frame- One of '0', '1' or '2'. '0' indicates that the first base of the feature is the first base of a codon, '1' that the second base is the first base of a codon, and so on..

9. attribute- (特性,比如编码的蛋白 等)A semicolon-separated list of tag-value pairs, providing additional information about each feature.

举例:

transcribed_pseudogene ------> gene ------> 11869 ------> 14409 ------> .------> +  ------>. ------> gene_id "ENSG00000223972"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene";

参考:

https://www.biostars.org/p/99462/

http://www.ensembl.org/info/website/upload/gff.html

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容