重要的Hi-C相关文献
第一篇Hi-C文章: Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome; DOI: 10.1126/science.1181369
TAD提出: Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions ;doi: 10.1038/nature11082
高分辨率Hi-C: A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping https://doi.org/10.1016/j.cell.2014.11.021
单细胞 Single cell: Hi-C reveals cell-to-cell variability in chromosome structure doi: 10.1038/nature12593;3D structures of individual mammalian genomes studied by single-cell Hi-C doi:10.1038/nature21429
综述:老师所讲Hi-C相关基础知识主要来自于综述Organization and function of the 3D genome,doi:10.1038/nrg.2016.112
Chromatin interaction in different resolutions
不同分辨率Hi-C可以看到的内容不同
5KB可以看到各种loop
10KB可以看到TAD
50kb可以看到TAD之间的关联
在整个染色体的水平可以看到染色质的位置分布
什么造成了所谓的TAD
cohesin complex
Cohesin is a protein complex that regulates the separation of sister chromatids during cell division, either mitosis or meiosis.
Cohesins hold sister chromatids together after DNA replication until anaphase when removal of cohesin leads to separation of sister chromatids.
CTCF proteins
转录阻抑物CTCF
CTCF与靶顺序因子的结合可阻断增强子和启动子的相互作用,从而将增强子的活性限制在一定的功能区域
除了阻断增强子外,CTCF还可作为染色质屏障阻止异染色质的传播
Predicting enhancer-promoter loops 如何预测EPL
两种类似的算法
TargetFinder(Whalen et al. Nat Gen 2016)— an algorithm that uses many functional genomic datasets, including DNase-seq, histone marks, transcription factor (TF) ChIP-seq, gene expression, and DNA methylation data etc.
Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin ,doi:10.1038/ng.3539
pipeline
RIPPLE (Roy et al. NAR 2016) — Also uses functional genomic datasets for feature extraction.
- A predictive modeling approach for cell line-specific long-range regulatory interactions , https://doi.org/10.1093/nar/gkv865
二者共同的发现
- signals from these functional genomic data are informative to computationally distinguish enhancer-promoter interactions from noninteracting enhancer-promoter pairs.
PEP 只用序列信息来进行分析(马坚实验室)
Hi-C分析流程
Analysis methods for studying the 3D architecture of the genome ,https://doi.org/10.1186/s13059-015-0745-7
流程
contact map
定义:A contact map is a matrix with rows and columns representing non-overlapping ‘bins’ across the genome.
Each entry in the matrix contains a count of read pairs that connect the corresponding bin pair in a Hi-C experiment.
How to determine bin size
- No standard rule. Rao et al. 2014 suggests using a bin size that results in at least 80% of all possible bins with >1000 contacts.
Two types of approaches to correct bias in the contact map
-
Explicit approach — assuming some known bias
- Restriction enzyme fragment lengths, GC content, and sequence mappability are three major sources of biases in Hi-C data (Yaffe and Tanay, Nat Genet 2011) - HiCNorm — simpler and faster (Hu et al. Bioinformatics 2012)
-
Implicit approach — assume no known source of bias and that each locus receives equal sequence coverage after biases are removed
- In other words, if there is no bias, the total genome-wide contact summation for each locus will be a constant, i.e., each locus has 'equal visibility'
Contact matrix normalization
如何进行标准化
鉴别TAD的算法
HMM(任兵)
Arrowhead