由于技术因素,scRNA-seq数据可能由于每个细胞中检测到的分子数量不同导致细胞与细胞间的差异。为了解决区分生物学异质性与技术造成的差异,本文提出正则化负二项分布中的皮尔逊残差(其中细胞测序深度用作广义线性模型中的协变量)在保留生物异质性的同时成功地消除了测序深度的影响。
文章原文:https://link.springer.com/article/10.1186/s13059-019-1874-1
Downstream analyses of Pearson residuals are unaffected by differences in sequencing depth. a UMAP embedding of the 33,148 cell PBMC dataset using either log-normalization or Pearson residuals. Both normalization schemes lead to similar results with respect to the major and minor cell populations in the dataset. However, in analyses of log-normalized data, cells within a cluster are ordered along a gradient that is correlated with sequencing depth. b Within the four major cell types, the percent of variance explained by sequencing depth under both normalization schemes. c UMAP embedding of two groups of biologically identical CD14+ monocytes, where one group was randomly downsampled to 50% depth. d Results of differential expression (DE) test between the two groups shown in c. Gray areas indicate expected group mean difference by chance and a false discovery rate cutoff of 1%. e Results of DE test between CD14+ and CD16+ monocytes, before and after randomly downsampling the CD16+ cells to 20% depth.
上图两种数据是模拟测序深度减半的两种数据,即红色和青色细胞,从中我们可以看到A中log-normalization的数据,测序深度的影响没有去除,而皮尔森残差则消除了这个影响,同时,从两种数据的差异基因及由测序深度导致的方差也能看出,用皮尔森残差效果更好。
因此,用Seurat进行单细胞数据分析的时候,可以选择SCTransform进行标准化。使用方法可参考:https://www.jieandze1314.com/post/cnposts/scrna-sctransform/
欢迎关注~
参考:
https://www.jieandze1314.com/post/cnposts/scrna-sctransform/
https://satijalab.org/seurat/v3.0/integration.html