Genomic encoding of transcriptional burst kinetics
题目:转录爆发动力学的基因组编码
作者及单位:
Anton J. M. Larsson, Per Johnsson, Michael Hagemann-Jensen, Leonard Hartmanis, Omid R. Faridani, Björn Reinius, Åsa Segerstolpe, Chloe M. Rivera, Bing Ren & Rickard Sandberg
Rickard Sandberg
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
- Ludwig Institute for Cancer Research, Stockholm, Sweden
- Integrated Cardio Metabolic Center (ICMC), Karolinska Institutet, Stockholm, Sweden
发表杂志及时间:
Nature (2019) Published: 02 January 2019
摘要:
Mammalian gene expression is inherently stochastic1,2, and results in discrete bursts of RNA molecules that are synthesized from each allele3,4,5,6,7. Although transcription is known to be regulated by promoters and enhancers, it is unclear how cis-regulatory sequences encode transcriptional burst kinetics. Characterization of transcriptional bursting, including the burst size and frequency, has mainly relied on live-cell4,6,8 or single-molecule RNA fluorescence in situ hybridization3,5,8,9 recordings of selected loci. Here we determine transcriptome-wide burst frequencies and sizes for endogenous mouse and human genes using allele-sensitive single-cell RNA sequencing. We show that core promoter elements affect burst size and uncover synergistic effects between TATA and initiator elements, which were masked at mean expression levels. Notably, we provide transcriptome-wide evidence that enhancers control burst frequencies, and demonstrate that cell-type-specific gene expression is primarily shaped by changes in burst frequencies. Together, our data show that burst frequency is primarily encoded in enhancers and burst size in core promoters, and that allelic single-cell RNA sequencing is a powerful model for investigating transcriptional kinetics.
哺乳动物基因的表达本质上是随机的,并导致RNA分子的离散爆发,这些RNA分子是从每个等位基因中 合成的。虽然已知转录受启动子和增强子的调控,但顺式调控序列是如何编码转录爆发动力学的尚不清 楚。转录爆裂的特征,包括突变的大小和频率,主要依赖于活细胞或单分子RNA荧光原位杂交位点的记 录。在这里,我们用等位基因敏感的单细胞RNA测序来确定内源性小鼠和人类基因的转录体范围的爆发 频率和大小。我们发现,核心启动子元素影响突变体的大小,揭示了TATA与引发剂元素之间的协同效 应,并在平均表达水平上进行了掩盖。值得注意的是,我们提供了转录体范围内的证据,说明增强剂控 制爆发频率,并证明细胞类型特异性基因的表达主要是由突发频率的变化决定的。我们的数据表明,突 变频率主要以增强子编码,在核心启动子中编码突发大小,等位单细胞RNA测序是研究转录动力学的有 力模型。
图表选析:
Fig. 1: Transcriptome-wide inference of transcriptional burst kinetics.
图1:转录组范围内推断转录突发动力学
a, Allele-resolution kinetics inferred from scRNA-seq data. The total expression for the Mbln2gene (top) was separated into allelic expression (maternal: middle; paternal: bottom). Inference was performed independently on total expression and allele-level expression to illustrate that allele-level inference has the required resolution, with expression measured as observed RNA molecules.
从Scrna-seq数据推断等位基因-分辨动力学。 mbln 2基因(上)的总表达分为等位基因表达(母体:中 间;父本:底部)。通过对总表达和等位基因水平表达的独立推断,说明等位基因水平推断具有所需的 分辨率,其表达量为观察到的RNA分子
b, Inferred burst kinetics for each gene (CAST allele) in primary fibroblasts (red dots, 7,186 genes). Blue contours indicate the inference precision defined as the width of the confidence interval divided by the point estimate from simulated observations (Supplementary Methods). Burst size in units of observed RNA molecules.
推测每个基因(CAST等位基因)在原代成纤维细胞(红点, 7, 186个基因)的爆发动力学。蓝色等高线表 示推理精度,定义为置信区间的宽度除以模拟观测的点估计(补充方法)。以观察到的RNA分子单位为单 位的爆裂大小。
c, Histogram of inferred burst frequencies for CAST allele in primary fibroblasts, in timescale of mRNA degradation rate.
原代成纤维细胞CAST等位基因突变频率直方图, mRNA降解率的时间尺度。
d, Histogram of inferred burst sizes (observed RNA molecules) for CAST allele in primary fibroblasts.
原代成纤维细胞CAST等位基因突变大小直方图(观察到的RNA分子)。
e, Scatter plot comparing inferred burst frequencies with gene-specific mRNA degradation rates (x axis) against inferred burst frequencies that did not use mRNA degradation rates (using the average degradation rate for all genes). Genes with the 50 longest (green) and shortest (red) mRNA degradation rates are marked. Data from ES cells and CAST allele.
将推断的突发频率与基因特异性mRNA降解率(x轴)与不使用mRNA降解率(所有基因的平均降解率)的推 断爆发频率进行比较。具有50最长(绿色)和最短(红色)mRNA降解率的基因是显著的。来自ES细胞和CAST等位基因的数据。
f, Histogram of allele-level waiting times between bursts (data from ES cells and CAST allele).
爆发之间等位基因水平等待时间的直方图(ES细胞和CAST等位基因的数据)
g, Scatter plot showing the inferred gene inactivation (koff) and activation (kon) rates, highlighting that genes have higher koff than kon values. Data from fibroblast and CAST allele.
散点图显示推断 的基因失活率(Koff)和激活率(Kon),说明基因的koff值高于kon值。来自成纤维细胞和铸造等位基因的数 据
Fig. 2: Core promoter elements dictate transcriptional burst size.
图2. 核 心启动子元件控制转录爆发的规模
a, Illustration of gene categorization (and colouring) according to TATA and initiator (Inr) elements in core promoters.
a, 根据核心启动子中的 TATA 和启动子(Inr) 元件的基因分类(着 色) 的说明
b, Burst size (from linear model) for genes, ordered and coloured based on core promoter elements (n = 6,935 genes, F-test).
b, 基因(线性模型) 的爆发大小, 基于核心启动子元件进行排序和 着色是(对 6,935 基因进行 F-检验) 。
c, The dependency between burst size and gene length for the gene categories. Burst size prediction from linear model with the shaded areas showing the 95% confidence intervals for the prediction, genes ordered (ascending) according to gene length.
c, 对于各种基因类别, 其爆发大小和基因长度之间的相关性。 对爆 发大小进行线性模型预测, 阴影区域显示预测的 95%置信区间, 根 据基因长度进行基因排序(上升) 。
Fig. 3: Enhancers regulate burst frequencies to shape cell-type-specific expression.
图 3:增强子调节突发频率以形成细胞类型特异性表达。
a, b, Scatter plots of transcriptome-wide inferred transcriptional burst frequencies (a) and sizes (b) in mouse ES cells and adult tail fibroblasts (n = 4,854 genes; C57 allele). Genes with significant differences (profile likelihood test, FDR < 0.05) between cell types are marked in red.
a, b,小鼠胚胎干细胞和成鼠尾成纤维细胞(n = 4,854 个基因:C57 等位基因)全转录组推断的 转录突发频率(a)和大小(b)散点图。细胞类型间差异显著的基因(profile likelihood test, FDR < 0.05)用红色标记。
c, Graph depicting cell-type differences in burst frequency and size, as a function of fold changes in mean expression between cell types. Lines represent median fold change in burst size and frequency between cell types for genes binned by expression difference (n genes per bin = 100).
c,图中描述了细胞类型在爆炸频率和大小上的差异,这是细胞类型间平 均表达水平变化的函数。线代表了表达差异(n 个基因/ bin = 100)所结合的基因在细胞类型 间爆发大小和频率的中位数折叠变化。
d, Graph depicting cell-type differences in enhancer magnitude (H3K27ac read densities in enhancers) for genes ordered by cell-type differences in either burst frequency or size. Computed as a rolling median in groups of 200 genes.
d,细胞类型差异增强子大小图(H3K27ac 读取增强子 密度)。以 200 个基因为一组的滚动中位数计算。
e, Validation of scRNA-seq inferred cell-type differences in transcriptional burst kinetics by smFISH on four genes (Hdac6, Msl3, Mpp1 and Igbp1). The left heat map denotes effect size and direction of change and the right heat map shows the significance level of cell-type difference in burst kinetics, separated by method, gene and burst kinetic parameter (profile likelihood test). For more information see Extended Data Fig. 7.
e, sCRNA-seq 验证推断 smFISH 在 4 个基 因(Hdac6、 Msl3、 Mppl 和 Igbp1)上转录突发动力学的红细胞类型差异。左侧热图表示影响 大小和变化方向,右侧热图表示细胞类型差异在突发动力学中的显著性水平,通过方法、基 因和突发动力学参数(剖面似然检验 )进行分离。详见扩展数据图 7
[Extended Data Fig. 10 Inference of kinetics in different phases of the cell cycle.]
扩展图10. 在细胞周期的不同阶段推断动力学
a, c, Comparisons of inferred burst frequency (a) and size (c) for the C57 allele in fibroblasts with cells classified according to cell cycle phase. Scatter plots of burst frequency and size are shown for comparisons between S and G1 (a) and S and G2/M (c) phases.
成纤维细胞C57等位基因突变频率(A)和大小(C)与细胞周期分期的比较。为比较s和G1(A)和s和G2/m(C) 相,给出了爆发频率和大小的散点图。
b, d, The Gene Ontology (GO) terms that are enriched in the group of genes with significant differential burst frequency between S and G1 (b) and S and G2/M (d) (n = 116 genes with differential burst frequency in b and 75 genes in d).
基因本体论(GO)术语在s和G1(B)、 s和G2/m(D)之间有显著差异的基因群中富集(n=116个有差异 爆发频率的基因, 75个基因在d)。
翻译小组:
王俊豪、陈志荣、邓峻玮、黄敬潼、郑凌伶