Guide to GO evidence codes

Origin: http://geneontology.org/docs/guide-go-evidence-codes/

Evidence codes fall into six general categories:

experimental evidence
phylogenetic evidence
computational evidence
author statements
curatorial statements
automatically generated annotations

Experimental evidence codes

The EXPerimental (EXP) evidence codes indicate that there is evidence from an experiment directly supporting the annotation of the gene.

Inferred from Experiment (EXP)
This code is used in an annotation to indicate that an high throughput experimental assay has been located in the cited reference, whose results indicate a gene product's function, process involvement, or subcellular location (indicated by the GO term).
Inferred from Direct Assay (IDA)
Inferred from Physical Interaction (IPI)
Inferred from Mutant Phenotype (IMP)
Inferred from Genetic Interaction (IGI)
Inferred from Expression Pattern (IEP)
Each of the experimental evidence codes have a corresponding ‘high throughput’ evidence code (HTP). HTP are a type of experimental evidence that indicate that the annotation is supported by high throughput methodologies. The high throughput evidence codes are:
Inferred from High Throughput Experiment (HTP)
Inferred from High Throughput Direct Assay (HDA)
Inferred from High Throughput Mutant Phenotype (HMP)
Inferred from High Throughput Genetic Interaction (HGI)
Inferred from High Throughput Expression Pattern (HEP)
The HTP code is equivalent to the conventional EXP code.The HTP code is the parent code for the HDA, HMP, HGI and HEP high throughput experimental codes.

Phylogenetically-inferred annotations

Phylogenetically-based annotations are derived from an explicit model of gain and loss of gene function at specific branches in a phylogenetic tree. Each inferred annotation can be traced to the direct experimental annotations that were used as the basis for that assertion. The GO Phylogenetic Annotation project is now the largest source of manually reviewed annotations in the GO knowledgebase, and it has substantially increased the number of annotations even in organisms that have been well-studied experimentally.

Computational analysis evidence codes

Use of the computational analysis evidence codes indicates that the annotation is based on an in silico analysis of the gene sequence and/or other data as described in the cited reference. The evidence codes in this category also indicate a varying degree of curatorial input. The computational analysis evidence codes are:

Inferred from Sequence or structural Similarity (ISS)
The ISS evidence code or one of its sub-categories should be used whenever a manual, sequence-based analysis forms the basis for an annotation. If the annotation has not been reviewed manually, the correct evidence code is IEA, even if the evidence supporting the annotation is all sequence based. ISS should be used if a combination of sequence-based tools or methods are used. If only one particular type of sequence-based evidence is used then one of the more specific sub-categories of ISS may be more appropriate for the annotation.
An ISS annotation is often based on more than just one type of sequence-based evidence. Often, a host of searches are performed for any given query protein. These searches might include BLAST, profile HMMs, TMHMM, SignalP, PROSITE, InterPro, etc. Evaluation of output from these search tools (bear in mind that every search may not yield results for every protein) leads an annotator to a particular ISS annotation for a particular protein. For example, a BLAST search might reveal that a query protein matches an experimentally characterized protein from another species at 50% identity over the full lengths of both proteins. After reading literature about the match protein, the curator sees that the match protein is known to contain a domain located in the plasma membrane and another domain that extends into the cytoplasm. It is also known from the literature that the experimentally characterized match protein requires the binding of ATP to function. TMHMM analysis of the query protein predicts several membrane spanning regions in one half of the protein (consistent with location in a membrane). In addition there are PROSITE and Pfam results which reveal the presence of an ATP-binding domain in the other half of the protein which TMHMM predicts to be cytoplasmic. These four search results taken together point to a probable identification of the query protein as having the function of the match protein.
Inferred from Sequence Orthology (ISO)
Inferred from Sequence Alignment (ISA)
Inferred from Sequence Model (ISM)
Inferred from Genomic Context (IGC)
Inferred from Reviewed Computational Analysis (RCA)

Electronic annotation evidence code

‘Electronic’ (IEA) annotation are not manually reviewed (although the method itself is usually subjected to various quality assessments). IEA-supported annotations are ultimately based on either homology and/or other experimental or sequence information, but cannot generally be traced to an experimental source. Three methods make up the bulk of these annotations. The first, and most comprehensive, method is InterPro2GO, which is based on the curated association of a GO term with **a generalized sequence model **(‘signature’) of a group of homologous proteins. Protein sequences with a statistically significant match to a signature are assigned the GO terms associated with the signature, a form of homology inference. A second method is the computational conversion of UniProt controlled vocabulary terms (including Enzyme Commission numbers describing enzymatic activities, and UniProt keywords describing subcellular locations), to associated GO terms. Lastly, annotations are made based on 1:1 orthologs inferred from Ensembl gene trees, an approach which automatically transfers annotations found experimentally in one gene, to its 1:1 orthologs in the same taxonomic clade (e.g. those within the vertebrate clade, and separately, those within the plant clade).

Inferred from Electronic Annotation (IEA)

Guide to GO evidence codes