hello,大家好,在我的文章10X单细胞(10X空间转录组)之NMF的实际运用示例(探索肿瘤特征)中提到了一种概念,细胞program genes的分数,这种分数的运用在很多方面,比如细胞分化的等级,肿瘤恶性等级等等,这次我们就来分享一下这个分数是如何计算的。
1、We used cell scores in order to evaluate the degree to which individual cells express a certain pre-defined expression program.(这里注意需要前定义好的program,这个program的获得文章中主要来自于NMF分析的结果,关于NMF分享了很多了,这里就不展开了)。
2、These are initially based on the average expression of the genes from the pre-defined program in the respective cell: Given an input set of genes (), we define a score, (), for each cell , as the average relative expression (Er) of the genes in .(看来主要是求每个细胞在每个program下的平均值)。
3、However, such initial scores may be confounded by cell complexity, as cells with higher complexity have more genes detected (i.e., less zeros) and consequently would be expected to have higher cell scores for any gene-set(这是很自然的问题,需要解决). To control for this effect we also add a control gene-set (); we calculate a similar cell score with the control gene-set and subtract it from the initial cell scores:() = average[Er(,)] – average[Er(,)]。(关键这个时候的gene set应该如何挑选).
4、The control gene-set is selected in a way that ensures similar properties (distribution of expression levels) to that of the input gene-set to properly control for the effect of complexity.(说白了要控制基因表达的特异性)。
5、First, all analyzed genes are binned into 25 bins of equal size based on their aggregate expression levels (E).
6、Next, for each gene in the given gene-set, we randomly select 100 genes from the same expression bin.(这个方式值得借鉴,不过方法较老)。
7、In this way, the control gene-set has a comparable distribution of expression levels to that of the considered gene-set, and is 100-fold larger, such that its average expression is analogous to averaging over 100 randomly-selected gene-sets of the same size as the considered gene-set.(智慧度拉满😄).
每天我们都要进步一点,不然,一年后还是一样的我们,没有更优秀~~
生活很好,等你超越