Recommended Coverage and Read Depth for NGS Applications

In this guide we define sequencing coverage as the average number of reads that align known reference bases, i.e number of reads x read length / target size; assuming that reads are randomly distributed across the genome. In other places coverage has also been defined in terms of breadth (i.e. assembly size / target size) and an empirical average depth of an assembly (i.e. number of reads x read length / assembly size).

While in general more coverage means that each base is covered by a larger number of aligned sequence reads, coverage and read requirements can depend on several of the following parameters:

  1. Read length
  2. Genome size
  3. Application
  4. Established guidelines in the literature
  5. Gene expression level
  6. Genome complexity, repetitive regions
  7. Error rate of sequencing instrument or methodology
  8. Assembly algorithm

In the table below we address 1-4. Simply click on the detection methods or applications below and adjust genome size, number of reads and read length to fit the organism you’re sequencing. The coverage values below apply to most organisms while the read recommendations are for mammalian species with genome sizes of ~3Gb. If you’re working with a smaller genome size you can proportionately scale down the number of reads to get an estimate. It is important to note coverage can depend heavily on the experiment you're trying to perform. In many cases, biological replicates offer more value than a large number of reads for a single sample. The values described below are what others in the field have determined necessary and are meant to serve as a starting point. To most accurately determine the coverage you need in an experiment, a sequencing saturation analysis should be performed.

This is an evolving coverage guide, meaning our goal is to improve it with new applications and citations. We’d love your feedback. You can contact us about this and other sequencing-related material at: science@genohub.com.

Table 1: Coverage and Read Recommendations by Application

Category Detection or Application Recommended Coverage (x) or Reads (millions) References
Whole genome sequencing Homozygous SNVs 15x Bentley et al., 2008
Heterozygous SNVs 33x Bentley et al., 2008
INDELs 60x Feng et al., 2014
Genotype calls 35x Ajay et al., 2011
CNV 1-8x Xie et al., 2009; Medvedev at al., 2010
Whole exome sequencing Homozygous SNVs 100x (3x local depth) Clark et al., 2011; Meynert et al., 2013
Heterozygous SNVs 100x (13x local depth) Clark et al., 2011; Meynert et al., 2013
INDELs not recommended Feng et al., 2014
Transcriptome Sequencing Differential expression profiling 10-25M Liu Y. et al., 2014; ENCODE 2011 RNA-Seq
Alternative splicing 50-100M Liu Y. et al., 2013; ENCODE 2011 RNA-Seq
Allele specific expression 50-100M Liu Y. et al., 2013; ENCODE 2011 RNA-Seq
De novo assembly >100M Liu Y. et al., 2013; ENCODE 2011 RNA-Seq
DNA Target-Based Sequencing ChIP-Seq 10-14M (sharp peaks); 20-40M (broad marks) Rozowsky et al., 2009; ENCODE 2011 Genome; Landt et al., 2012
Hi-C 100M Belton, J.M et al., 2012
4C (Circularized Chromosome Confirmation Capture) 1-5M van de Weken, H.J.G. et al., 2012
5C (Chromosome Carbon Capture Carbon Copy) 15-25M Sanyal A. et al., 2012
ChIA-PET (Chromatin Interaction Analysis by Paired-End Tag Sequencing) 15-20M Zhang, J. et al., 2012
FAIRE-Seq 25-55M ENCODE 2011 Genome; Landt et al., 2012
DNAse 1-Seq 25-55M Landt et al., 2012
DNA Methylation Sequencing CAP-Seq >20M Long, H.K. et al., 2013
MeDIP-Seq 60M Taiwo, O. et al., 2012
RRBS (Reduced Representation Bisulfite Sequencing) 10X ENCODE 2011 Genome
Bisulfite-Seq 5-15X; 30X Ziller, M.J et al., 2015; Epigenomics Road Map
RNA-Target-Based Sequencing CLIP-Seq 10-40M Cho J. et al., 2012; Eom T. et al., 2013; Sugimoto Y. et al., 2012
iCLIP 5-15M Sugimoto Y. et al., 2012; Rogelj B. et al., 2012
PAR-CLIP 5-15M Rogelj B. et al., 2012
RIP-Seq 5-20M Lu Z. et al., 2014
Small RNA (microRNA) Sequencing Differential Expression ~1-2M Metpally RPR et al., 2013; Campbell et al., 2015
Discovery ~5-8M Metpally RPR et al., 2013; Campbell et al., 2015

References:

  • Ajay, S.S et al. Accurate and comprehensive sequencing of personal genomes. Genome Research 21, 1498 (2011).
  • Belton, J.M. et al., Hi-C: a comprehensive technique to capture the conformation of genomes. Methods, 58, 221-230 (2012).
  • Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
  • Campbell J.D. et al., Assessment of microRNA differential expression and detection in multiplexed small RNA sequencing data. RNA 21, 164-171 (2015).
  • Cho J. et al., LIN28A Is a Suppressor of ER-Associated Translation in Embryonic Stem Cells. Cell 151, 765-777 (2012).
  • Clark, M. J. et al. Performance comparison of exome DNA sequencing technologies. Nature Biotech. 29, 908–914 (2011).
  • ENCODE 2011 Genome Guidelines
  • ENCODE 2011 RNA-Seq Guidelines
  • Eom T. et al., NOVA-dependent regulation of cryptic NMD exons controls synaptic protein levels after seizure. Elife 2, e00178 (2013).
  • Epigenomics Road Map Guidelines
  • Feng, H. et al. Reducing INDEL calling errors in whole genome and exome sequencing data. Genome Medicine 6, 89 (2014).
  • Landt, S.G. et al., ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Research, 22, 1813-1831 (2012).
  • Liu Y., et al., RNA-seq differential expression studies: more sequence or more replication? Bioinformatics 30(3):301-304 (2014).
  • Liu Y., et al., Evaluating the impact of sequencing depth on transcriptome profiling in human adipose. Plos One 8(6):e66883 (2013).
  • Long, H.K. et al., Epigenetic conservation at gene regulatory elements revealed by non-methylated DNA profiling in seven vertebrates. eLIFE 2, e00348 (2013).
  • Lu Z. et al., RIP-seq analysis of eukaryotic Sm proteins identifies three major categories of Sm-containing ribonucleoproteins. Genome Biology 15:R7 (2014).
  • Maynert et al., Quantifying single nucleotide variant detection sensitivity in exome sequencing. BMC Bioinformatics 14, 195 (2013).
  • Medvedev, P. Detecting copy number variation with mated short reads. Genome Research 20, 1613 (2010).
  • Metpally RPR et al., Comparison of Analysis Tools for miRNA High Throughput Sequencing Using Nerve Crush as a Model. Frontiers in Genetics 4:20 (2013).
  • Rogelj et al., Widespread binding of FUS along nascent RNA regulates alternative splicing in the brain. Scientifc Reports 2, 603 (2012).
  • Rozowsky, J.et al., PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nature Biotech. 27, 65-75 (2009).
  • Sanyal, A. et al., The long-range interaction landscape of gene promoters. Nature, 489, 109-113 (2012).
  • Sugimoto Y et al., Analysis of CLIP and iCLIP methods for nucleotide-resolution studies of protein-RNA interactions. Genome Biology 13:R67 (2012).
  • Taiwo, O. et al., Methylome analysis using MeDIP-seq with low DNA concentrations. Nature Protocols 7 617-636 (2012).
  • van de Weken, H.J.G. et al., Robust 4C-seq data analysis to screen for regulatory DNA interactions. Nature Methods 9, 969-972 (2012).
  • Xie, C. & Tammi, M. T. CNV–seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics 10, 80 (2009).
  • Zhang, J. et al., ChIA-PET analysis of transcriptional chromatin interactions. Methods 58 289-299 (2012).
  • Ziller, M.J et al., Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing. Nature Methods 12, 230-232 (2015).

from https://genohub.com/recommended-sequencing-coverage-by-application/

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 219,869评论 6 508
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,716评论 3 396
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 166,223评论 0 357
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 59,047评论 1 295
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 68,089评论 6 395
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,839评论 1 308
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,516评论 3 420
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,410评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,920评论 1 319
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 38,052评论 3 340
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,179评论 1 352
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,868评论 5 346
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,522评论 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,070评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,186评论 1 272
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,487评论 3 375
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,162评论 2 356

推荐阅读更多精彩内容