hello,大家好,今天我们来总结单细胞数据识别肿瘤细胞的分析原理。其中单细胞识别肿瘤细胞最大的问题在于,reference,什么细胞作为reference,上皮细胞癌变,自然是正常的上皮细胞作为reference,但很多时候,光依靠单细胞数据我们我们无法区分恶性和非恶性的细胞类型,CNV判断也需要很精准的人为监督和数据分析,这一次,我们来分析一下识别恶性细胞的分析原理。
这里要注意啊,一定要使用配套的单细胞数据,确保其中含有恶性的细胞类型,不然光有正常的细胞类型也能分析出结果,拿到的结果是没有任何用处的。
大家推断CNV应该用的是inferCNV或者copycat,原理都差不多。个人倾向于copycat。
CNV Estimation:Initial CNVs (CNV0) were estimated by sorting the analyzed genes by their chromosomal location and applying a moving average to the relative expression values, with a sliding window of 100 genes within each chromosome
, 这里大家应该都知道才对。我们逐步解读。
第一步:To avoid considerable impact of any particular gene on the moving average, we limited the relative expression values to [-3,3] by replacing all values above 3 by a ceiling of 3, and replacing values below -3 by a floor of -3.(这里不知道大家知道多少,对数据进行剪接,这是必要的)This was performed only in the context of CNV estimation。
第二步:We scored each cell for the extent of CNV signal, defined as the mean of squares of CNV0 values across the genome, and for the correlation between the CNV0 profile of each cell with the average CNV0 profile of all cells from the corresponding tumor.(这个地方不陌生吧,就是对CNV的判断)。
第三步:Putative malignant cells were then defined as those with CNV signal above 0.05 and CNV correlation above 0.5, putative non-malignant cells as those below the two cutoffs, and unresolved cells as those above only one of the thresholds.(跟inferCNV的软件阈值是一致的).
第四步:检验,This initial analysis was based on the average CNV0 of all cells as a reference, which is biased due to the inclusion of many malignant cells. We thus redefined CNV estimations, the CNV signal, and CNV correlations values using the average patterns of nonmalignant cells as a reference.(真正的ref必然是相同细胞类型的正常细胞)。
第五步:CNV estimate :
当然,inferCNV软件还有一步降噪,大家感兴趣可以多多看看,总结归纳。
基础知识,多多学习,生活很好,有你更好