介绍
从集合角度考虑同一类型基因的功能性区别,这也是基因富集分析的出发点。单个基因解析生物学功能还是偏弱,说服力不够。
过表达分析ORA富集基因类别解析功能变化,一般基于超几何检验分布(DAVID)。
第二代富集方法是使用Functional Class Scoring(FCS)方法,选择某一特定基因集,计算分组的基因在该基因集的得分,称之为enrichment score,它避免了ORA筛选差异基因的步骤。方法有GSEA、GLOBALTEST等。
第三代是对单个样本计算enrichment score,后续可通过传统统计分析计算富集得分和表型之间的关联关系。方法有 PLAGE、Z-score和ssGSEA以及GSVA。
方法初解
Gene Set Enrichment Analysis (GSEA): 1.根据表型排序基因集的基因;2.判断ranks of genes是否和均匀分布有差别(weighted Kolmogorov-Smirnov test)。
GLOBALTEST uses a logistic regression model to determine if samples with similar profiles have similar phenotype by testing if the variance of the coefficients of genes in the gene set is different from 0.
Gene Set Analysis (GSA) uses the maxmean statistic to determine if either up- or down-regulation of genes is the trend for which the evidence is the strongest for a particular gene set.
Single Sample GSEA (SSGSEA) calculates a sample level gene set score by comparing the distribution of gene expression ranks inside and outside the gene set.
The Gene Set Variation Analysis (GSVA) uses a non-parametric kernel to estimate the distribution of the gene expression level across all samples in order to bring the expression profiles to a common scale and then computes the Kolmogorov-Smirnov statistic similar to GSEA.
方法
敏感性比较
假阳性比较
Reference
A Comparison of Gene Set Analysis Methods in Terms of Sensitivity, Prioritization and Specificity