【注释-2】Annovar-4——Filter-based annotation

Filter-based annotation是基于筛选进行注释的

For frequency of variants in whole-genome data:全基因组数据中变异的频率

  • 1000g2015aug: latest 1000 Genomes Project dataset with allele frequencies in six populations including ALL, AFR (African), AMR (Admixed American), EAS (East Asian), EUR (European), SAS (South Asian). These are whole-genome variants.
  • kaviar_20150923: latest Kaviar database with 170 million variants from 13K genomes and 64K exomes.
  • hrcr1: latest Haplotype Reference Consortium database with 40 million variants from 32K samples in haplotype reference consortium
  • cg69: allele frequency in 69 human subjects sequenced by Complete Genomics. 完整的基因组学为相对较小的一组健康受试者提供全基因组数据,目前,在ANNOVAR中提供了cg46和cg69,分别代表来自46个无关受试者和69个相关受试者(包括46个无关受试者)的等位基因频率数据。46名受试者只有92个常染色体.
  • gnomad_genome: allele frequency in gnomAD database whole-genome sequence data on multiple populations.

For frequency of variants in whole-exome data:全外显子组数据变异的频率

  • exac03: latest Exome Aggregation Consortium dataste with allele frequencies in ALL, AFR (African), AMR (Admixed American), EAS (East Asian), FIN (Finnish), NFE (Non-finnish European), OTH (other), SAS (South Asian).
  • esp6500siv2: latest NHLBI-ESP project with 6500 exomes. Three separate key words are used for 3 population groupings: esp6500siv2_all, esp6500siv2_ea, esp6500siv2_aa.
  • gnomad_exome: allele frequency in gnomAD database whole-exome sequence data on multiple populations.

For frequency of variants in isolated or less represented populations:孤立或较少代表群体的变异频率

  • ajews: common alleles in ashkenazi jews
  • TMC-SNPDB: common alleles in Indian populations
  • gme: GME (Greater Middle East Variome) allele frequency, including ALL, NWA (northwest Africa), NEA (northeast Africa), AP (Arabian peninsula), Israel, SD (Syrian desert), TP (Turkish peninsula) and CA (Central Asia).

For functional prediction of variants in whole-genome data:功能预测全基因组数据中的变异

  • gerp++: functional prediction scores for 9 billion mutations based on selective constraints across human genome. You can optionally use gerp++gt2 instead since it includes only RS score greater than 2, which provides high sensitivity while still strongly enriching for truly constrained sites
  • cadd: Combined Annotation Dependent Depletion score for 9 billion mutations. It is basically constructed by a support vector machine trained to differentiate 14.7 million high-frequency human-derived alleles from 14.7 million simulated variants, using ~70 different features. For known indels, use caddindel.
  • cadd13: CADD version 1.3.
  • dann: functional prediction score generated by deep learning, using the identical set of training data as cadd but with much improved performance than cadd.
  • fathmm: a hidden markov model to predict the functional importance of both coding and non-coding variants (that is, two separate scores are provided) on 9 billion mutations.
  • eigen: a spectral approach integrating functional genomic annotations for coding and noncoding variants on 9 billion mutations, without labelled training data (that is, unsupervised approach)
  • gwava: genome-wide annotation of variants that supports prioritization of noncoding variants by integrating various genomic and epigenomic annotations on 9 billion mutations.

For functional prediction of variants in whole-exome data:功能预测全外显子组数据中的变异

  • dbnsfp30a: this dataset already includes SIFT, PolyPhen2 HDIV, PolyPhen2 HVAR, LRT, MutationTaster, MutationAssessor, FATHMM, MetaSVM, MetaLR, VEST, CADD, GERP++, DANN, fitCons, PhyloP and SiPhy scores, but ONLY on coding variants

For functional prediction of splice variants:剪接变体的功能预测

  • dbscsnv11: dbscSNV version 1.1 for splice site prediction by AdaBoost and Random Forest, which score how likely that the variant may affect splicing
  • spidex: deep learning based prediction of splice variants. Unlike dbscsnv11, these variants could be far away from canonical splice sites

For disease-specific variants:疾病特异性变异

  • clinvar_20160302: ClinVar database with separate columns (CLINSIG CLNDBN CLNACC CLNDSDB CLNDSDBID) for each variant (Please check the download page for the latest version, or read below for creating your own most updated version)
  • cosmic70: the latest COSMIC database with somatic mutations from cancer and the frequency of occurence in each subtype of cancer. For more updated cosmic, see instructions below on how to make them.
  • icgc21: International Cancer Genome Consortium version 21 mutations.
  • nci60: NCI-60 human tumor cell line panel exome sequencing allele frequency data

For variant identifiers:变体标识符

  • snp142: dbSNP version 142
  • snp138:dbSNP version 138
  • avsnp142: an abbreviated version of dbSNP 142 with left-normalization by ANNOVAR developers. avSNP数据集基本上是重新格式化的dbSNP数据集

LJB* (dbNSFP) non-synonymous variants annotation非同义突变注释

到2017年,数据库变为dbnsfp33a,以前的版本为ljb26, ljb23, ljb2, ljb。输出结果包含的有:SIFT_score SIFT_pred Polyphen2_HDIV_score Polyphen2_HDIV_pred Polyphen2_HVAR_score Polyphen2_HVAR_pred LRT_score LRT_pred MutationTaster_score MutationTaster_pred MutationAssessor_score MutationAssessor_pred FATHMM_score FATHMM_pred PROVEAN_score PROVEAN_pred VEST3_score CADD_raw CADD_phred DANN_score fathmm-MKL_coding_score fathmm-MKL_coding_pred MetaSVM_score MetaSVM_pred MetaLR_score MetaLR_pred integrated_fitCons_score integrated_confidence_value GERP++_RS phyloP7way_vertebrate phyloP20way_mammalian phastCons7way_vertebrate phastCons20way_mammalian SiPhy_29way_logOdds

参考:http://annovar.openbioinformatics.org/en/latest/user-guide/filter/#dbsnp-annotations

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,372评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,368评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 162,415评论 0 353
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,157评论 1 292
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,171评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,125评论 1 297
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,028评论 3 417
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,887评论 0 274
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,310评论 1 310
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,533评论 2 332
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,690评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,411评论 5 343
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,004评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,659评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,812评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,693评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,577评论 2 353

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,322评论 0 10
  • 转眼间,来到了五月的中后段,之间总是流逝的飞快,今年的目标就是做足减法寻找一个属于自己的目标,一直坚持下去。...
    谭大瘦子阅读 433评论 0 0
  • 又经过这座城市,你在哪儿呢,每个路口,灯火通明,你还好吗? 走过的每一条路,渐渐淡出记忆,从过去难过得难以入...
    二十二秋初阅读 148评论 0 0
  • 对于实修者而言,重要的不但是学以致用,致良知,更要落地彻底解脱!所谓彻底解脱就是要彻底忘掉头脑中的所学的东西,不能...
    心灵花园阅读 246评论 0 0
  • 墨墨劝阻了楚王攻宋,本打算连夜赶往齐国,去准备他线下的”非攻“大课,却被鲁鲁强拉着过府饮宴,还要留他在府上多住几日...
    勇赴阅读 447评论 4 2