Non-canonical open reading frames (ORFs) 非经典的开放阅读框
the ~20,000 canonical, well-annotated genes 两万个经典的注释良好的基因
ribosome-associated RNAs 核糖体相关的RNA
inflammatory bowel disease 炎症性肠病 comorbidity 合并症
chronic disorders 慢性疾病 common pathophysiological processes 常见的病理生理过程
the BioMe BioBank 是一个与电子病历相关的生物库,使研究人员能够快速有效地进行遗传、流行病学、分子等方面的研究
Optimal Unified Approach 最优统一方法
burden test 负荷测试 same direction 相同方向
sequence kernel association test 序列核关联测试 different direction 不同方向 nonburden tests 非负荷测试 kernel matrix 核矩阵 SNP-SNP interactions SNP-SNP 相互作用
dichotomous 二分法 acute-lung-injury 急性肺损伤 etiology 病因学 implicitly 隐含地 magnitudes 程度
Single-variant tests 单变异测试
rare-variant effects 罕见变异效应 low frequencies 低频 large numbers 数目多 cumulative effects 累积效应
the cohort allelic sum test 队列等位基因总和检验 weighted sum test (WST) 加权和检验
substantial 重大的 attractive power 吸引力 adipose cells 脂肪细胞
SKAT-O automatically behaves like the burden test when the burden test is more powerful than SKAT, and behaves like SKAT when the SKAT is more powerful than the burden test. 当负荷测试比 SKAT 更强大时,SKAT-O 自动表现得像负荷测试,而当 SKAT 比负荷测试更强大时,SKAT-O 会自动表现得像 SKAT。
kurtosis 峰度 subjects 主题 asymptotically 渐近地 notation 符号 foregoing 前面的 substantial interest 重大利益 elucidate 阐明 leptin, a hormone that regulates appetite 瘦素,调控食欲
the National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP) 美国国家心肺血液研究所外显子组测序项目
the acute lung injury (ALI) exome-sequencing data 急性肺损伤 (ALI) 外显子组测序数据
the chronic Pseudomonas aeruginosa infection 慢性铜绿假单胞菌感染
SKAT-O 优于 SKAT 和 burden test
additive genetic model 加性遗传模型 reduce the df and increase analysis power 降低自由度并提高分析能力
SKAT is a variance-component test under the induced logistic mixed model SKAT是归纳逻辑混合模型下的方差分量检验
n×n kernel matrix;核矩阵;n×m genotype matrix;基因型矩阵;m×m diagonal weight matrix;对角权重矩阵
the trait is binary and sample sizes are small, e.g., hundreds of subjects, 性状为二分类并且样本量为几百个时,lead to incorrect type I error control and power loss. I类错误率不正常并且丧失效能
small-sample-adjusted p value calculations for SKAT and SKAT-O 两种方式都有小样本量校正的p值
asymptotic null distribution 渐近零分布 the calibrated coalescent model 校准的聚结模型
N, Beta(1, 25) weights 权重;equal-sized grids 等大小的网格
Note that α = 2.5×10 − 6 is Bonferroni-adjusted level α = 0.05 when simultaneously testing 20,000 genes.
图:Manhattan plots of common- and rare-variant associations for T2D,第一张图GWAS results for common variants (MAF ≥ 0.01) ,第二张图GWAS results of rare variants (0.0001 ≤ MAF < 0.01) in UKB,
【2020 Nature;Identification of type 2 diabetes loci in 433,540 East Asian individuals】Here, to examine T2D risk in East Asian individuals, we carried out a meta-analysis of GWAS data from 77,418 individuals with T2D and 356,122 healthy control individuals (effective sample size, Neff = 211,793). we identified 301 distinct association signals at 183 loci. we identified 61 loci that are newly implicated in predisposition to T2D. We defined ‘lead’ variants as the strongest T2D-associated variants with P < 5 × 10−8 and defined the region ±500 kb from the lead variant as a locus. 我们将“先导”变体定义为 P < 5 × 10−8 的最强 T2D 相关变体,并将先导变体 ±500 kb 的区域定义为基因座。A locus was considered ‘novel’ if the lead variant was located at least 500 kb from previously reported T2D-associated variants in any ancestry. 如果先导变异位于任何祖先中与之前报道的 T2D 相关变异至少 500 kb 的位置,则该位点被认为是“新颖的”。we identified lead variants associated with T2D at 183 loci, of which 51 were novel. 我们在 183 个位点鉴定了与 T2D 相关的先导变异,其中 51 个是新的。Lead variants at all novel loci were common (minor allele frequency (MAF) ≥ 5%)。所有新位点的先导变异都很常见(次要等位基因频率 (MAF) ≥ 5%)。Using genome-wide complex trait analysis (GCTA), we identified 301 distinct association signals that met a locus-wide significance threshold of P < 1 × 10−5, 228 of which were genome-wide significant (P < 5 × 10−8). 使用全基因组复杂性状分析 (GCTA),我们确定了 301 个不同的关联信号,这些信号满足 P < 1 × 10−5 的全基因座显着性阈值,其中 228 个具有全基因组显着性 (P < 5 × 10−8)。Overall, we observed 2–4 signals at 46 loci and 5 or more signals at 12 loci. 总体而言,我们在 46 个位点观察到 2-4 个信号,在 12 个位点观察到 5 个或更多信号。