Raw UMI count matrices were imported into R for further processing. For each scRNA-Seq sample, cell calling was performed using ‘emptyDrops’ function from DropletUtils (version 1.4.3) on the full raw count matrices in order to distinguish cells from empty droplets containing only ambient RNA(采用一定的算法,也就是软件辅助去除空油滴,而不再是简单粗暴的去除). Raw count matrices were corrected for Illumina index swapping using ‘swapped-Drops’ . This identified 140,264 non-empty droplets across all single cell pools.这个分析来自于文章Spatiotemporal analysis of human intestinal development at single-cell resolution,2021年初发表于Cell,值得借鉴,其中用到的软件DropletUtils,文章在EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data,2019年发表于Genome Biology,影响因子11分,软件在DropletUtils,代码相对简单,但是背后的算法原理,大家需要认真学习一下了。
droplet barcodes for which a high percentage of total UMIs originated from mitochondrial RNAs were filtered out, as well as low total UMI count barcodes. These thresholds were derived individually for cells within each compartment following an initial clustering solution of all cells by examining and thresholding empirical distributions within each compartment
, as total RNA content (notably higher in endothelial and myeloid cell populations) and mitochondrial RNA content (notably higher in epithelial cells) are highly cell type dependent
For each individual pool, Seurat (Butler et al., 2018) R package (version was used to normalize expression values for total UMI counts per cell. Highly variable genes were identified by fitting the mean-variance relationship and dimensionality reduction was performed using principal-component analysis. Scree plots were used to determine principal components to use for clustering analyses for each pool. Cells were then clustered using Louvain algorithm for modularity optimization using kNN graph as input. Cell clusters were visualized using UMAP algorithm (McInnes et al., 2018) with principal components as input and n.neighbors = 30, spread = 1 and min.dist = 0.1.
看起来很常规,但是我们需要注意这里 n.neighbors = 30, spread = 1 and min.dist = 0.1,我相信很多人在进行UMAP降维的时候都采用默认是,我们先来看看默认值是多少,n.neighbors = 30, spread = 1 and min.dist = 0.3,这三个参数的意义我们来看一下:
- n.neighbors: This determines the number of neighboring points used in local approximations of manifold structure. Larger values will result in more global structure being preserved at the loss of detailed local structure. In general this parameter should often be in the range 5 to 50.(跟流形学习有关,值越大更加保留全局结构而损失local结构,这里不详细展开聊了,在我分享的文章10X单细胞(10X空间转录组)降维分析之tSNE(算法基础知识)已经有过详细的介绍了)。
- spread:The effective scale of embedded points. In combination with min.dist this determines how clustered/clumped the embedded points are.(决定降维结构点)。
- min.dist:This controls how tightly the embedding is allowed compress points together. Larger values ensure embedded points are moreevenly distributed, while smaller values allow the algorithm to optimise more accurately with regard to local structure. Sensible values are in the range 0.001 to 0.5.(决定点降维的紧密程度,较大的值可确保嵌入点分布更均匀,而较小的值可使算法在局部结构方面进行更准确的优化。)。
Cells from separate pools were merged and pool batch effect signal was corrected using harmony (version 1.0) algorithm(harmony矫正)。Merged cell clustering and visualization of cells from all pools was performed as before using Louvain and UMAP algorithms, using harmony dimensionality reduction as input instead of principal components. Merged pool clusters were compared with cell types obtained from individual pools to ensure cell type heterogeneity was not lost due to batch correction.
Merged cell data was then divided into compartments based on clustering analysis and marker gene expression, as outlined above. Cells from epithelial, endothelial, pericyte, muscularis, neural, fibroblast, immune, myofibroblast and mesothelial compartments were subset for further analysis. For each compartment, we carried out compartment-specific QC, batch correction and clustering analyses as described above.
注意了,多样本整合后的结果每种细胞类型再分群分析的时候,For each compartment, we carried out compartment-specific QC, batch correction and clustering analyses as described above。一定要注意,千万别用老式的那一套。