Guide to GO evidence codes

Origin: http://geneontology.org/docs/guide-go-evidence-codes/

Evidence codes fall into six general categories:
  • experimental evidence
  • phylogenetic evidence
  • computational evidence
  • author statements
  • curatorial statements
  • automatically generated annotations

Experimental evidence codes

  The EXPerimental (EXP) evidence codes indicate that there is evidence from an experiment directly supporting the annotation of the gene.

Phylogenetically-inferred annotations

  Phylogenetically-based annotations are derived from an explicit model of gain and loss of gene function at specific branches in a phylogenetic tree. Each inferred annotation can be traced to the direct experimental annotations that were used as the basis for that assertion. The GO Phylogenetic Annotation project is now the largest source of manually reviewed annotations in the GO knowledgebase, and it has substantially increased the number of annotations even in organisms that have been well-studied experimentally.

Computational analysis evidence codes

  Use of the computational analysis evidence codes indicates that the annotation is based on an in silico analysis of the gene sequence and/or other data as described in the cited reference. The evidence codes in this category also indicate a varying degree of curatorial input. The computational analysis evidence codes are:

  • Inferred from Sequence or structural Similarity (ISS)
      The ISS evidence code or one of its sub-categories should be used whenever a manual, sequence-based analysis forms the basis for an annotation. If the annotation has not been reviewed manually, the correct evidence code is IEA, even if the evidence supporting the annotation is all sequence based. ISS should be used if a combination of sequence-based tools or methods are used. If only one particular type of sequence-based evidence is used then one of the more specific sub-categories of ISS may be more appropriate for the annotation.
      An ISS annotation is often based on more than just one type of sequence-based evidence. Often, a host of searches are performed for any given query protein. These searches might include BLAST, profile HMMs, TMHMM, SignalP, PROSITE, InterPro, etc. Evaluation of output from these search tools (bear in mind that every search may not yield results for every protein) leads an annotator to a particular ISS annotation for a particular protein. For example, a BLAST search might reveal that a query protein matches an experimentally characterized protein from another species at 50% identity over the full lengths of both proteins. After reading literature about the match protein, the curator sees that the match protein is known to contain a domain located in the plasma membrane and another domain that extends into the cytoplasm. It is also known from the literature that the experimentally characterized match protein requires the binding of ATP to function. TMHMM analysis of the query protein predicts several membrane spanning regions in one half of the protein (consistent with location in a membrane). In addition there are PROSITE and Pfam results which reveal the presence of an ATP-binding domain in the other half of the protein which TMHMM predicts to be cytoplasmic. These four search results taken together point to a probable identification of the query protein as having the function of the match protein.
  • Inferred from Sequence Orthology (ISO)
  • Inferred from Sequence Alignment (ISA)
  • Inferred from Sequence Model (ISM)
  • Inferred from Genomic Context (IGC)
  • Inferred from Reviewed Computational Analysis (RCA)

Electronic annotation evidence code

‘Electronic’ (IEA) annotation are not manually reviewed (although the method itself is usually subjected to various quality assessments). IEA-supported annotations are ultimately based on either homology and/or other experimental or sequence information, but cannot generally be traced to an experimental source. Three methods make up the bulk of these annotations. The first, and most comprehensive, method is InterPro2GO, which is based on the curated association of a GO term with **a generalized sequence model **(‘signature’) of a group of homologous proteins. Protein sequences with a statistically significant match to a signature are assigned the GO terms associated with the signature, a form of homology inference. A second method is the computational conversion of UniProt controlled vocabulary terms (including Enzyme Commission numbers describing enzymatic activities, and UniProt keywords describing subcellular locations), to associated GO terms. Lastly, annotations are made based on 1:1 orthologs inferred from Ensembl gene trees, an approach which automatically transfers annotations found experimentally in one gene, to its 1:1 orthologs in the same taxonomic clade (e.g. those within the vertebrate clade, and separately, those within the plant clade).

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 久违的晴天,家长会。 家长大会开好到教室时,离放学已经没多少时间了。班主任说已经安排了三个家长分享经验。 放学铃声...
    飘雪儿5阅读 7,588评论 16 22
  • 今天感恩节哎,感谢一直在我身边的亲朋好友。感恩相遇!感恩不离不弃。 中午开了第一次的党会,身份的转变要...
    迷月闪星情阅读 10,620评论 0 11
  • 在妖界我有个名头叫胡百晓,无论是何事,只要找到胡百晓即可有解决的办法。因为是只狐狸大家以讹传讹叫我“倾城百晓”,...
    猫九0110阅读 3,360评论 7 3
  • 彩排完,天已黑
    刘凯书法阅读 4,336评论 1 3