NBIS系列单细胞转录组数据分析实战(六):细胞类型注释

第六节:细胞类型注释

在本节教程中,我们将进行细胞类型预测分析。我们既可以对单个细胞(每个细胞都获得预测的细胞类型)进行预测,也可以在每个细胞簇上执行。所有方法均基于与其他数据集的相似性比较,可以是sorting分选的单细胞或批量RNA-seq数据集,或针对每种细胞类型使用已知的标记基因。

在这里,我们将从Covid数据中选择一个样本ctrl_13,并预测该样本中的细胞类型。我们利用scPred包中的PBMC参考数据集,使用Seurat包中基于标签转移(label trasfer)的TransferData函数和scPred方法进行细胞类型预测。同时,我们还使用基于每个簇的DEG进行基因集富集分析来预测细胞类型。

即使参考数据集中未包含一些细胞的细胞类型,某些方法也会根据最相似的原则预测每个细胞的细胞类型。当然,这些预测方法也会有一定的不确定性,因此具有较低相似性评分的细胞将无法被分类。目前,已有多种不同的预测细胞类型的方法,这里我们仅介绍其中的一些。

加载所需的R包和数据集

suppressPackageStartupMessages({
    library(Seurat)
    library(venn)
    library(dplyr)
    library(cowplot)
    library(ggplot2)
    library(pheatmap)
    library(rafalib)
    library(scPred)
})

# load the data and select 'ctrl_13` sample
alldata <- readRDS("data/results/covid_qc_dr_int_cl.rds")

# 提取ctrl_13样本的数据
ctrl = alldata[, alldata$orig.ident == "ctrl_13"]

# set active assay to RNA and remove the CCA assay
ctrl@active.assay = "RNA"
ctrl[["CCA"]] = NULL
ctrl
## An object of class Seurat 
## 18121 features across 1129 samples within 1 assay 
## Active assay: RNA (18121 features, 0 variable features)
##  6 dimensional reductions calculated: umap, tsne, harmony, umap_harmony, scanorama, umap_scanorama

获取参考数据集

接下来,我们从scPred包中提取PBMC参考数据集,并进行常规的数据标准化、可变基因筛选、归一化和降维处理。

# 提取scPred包中PBMC参考数据集
reference <- scPred::pbmc_1

reference
## An object of class Seurat 
## 32838 features across 3500 samples within 1 assay 
## Active assay: RNA (32838 features, 0 variable features)

对参考数据集进行预处理

这里,我们使用magittr包中的管道符%>%一次性运行预处理的所有步骤。

reference <- reference %>% NormalizeData() %>% FindVariableFeatures() %>% ScaleData() %>% 
                           RunPCA(verbose = F) %>% RunUMAP(dims = 1:30)

DimPlot(reference, group.by = "cell_type", label = TRUE, repel = TRUE) + NoAxes()
image.png

同样的,我们对ctrl_13样本执行相应的操作,并提取先前整合数据中在0.3分辨率下的聚类分群结果。

# Set the identity as louvain with resolution 0.3
ctrl <- SetIdent(ctrl, value = "CCA_snn_res.0.5")

ctrl <- ctrl %>% NormalizeData() %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA(verbose = F) %>% RunUMAP(dims = 1:30)

DimPlot(ctrl, label = TRUE, repel = TRUE) + NoAxes()
image.png

使用Seurat包中的标签转移方法进行细胞类型预测

First we will run label transfer using a similar method as in the integration exercise. But, instad of CCA the default for the ’FindTransferAnchors` function is to use “pcaproject”, e.g. the query datset is projected onto the PCA of the reference dataset. Then, the labels of the reference data are predicted.

# 使用FindTransferAnchors函数寻找query和reference数据集之间的anchors
transfer.anchors <- FindTransferAnchors(reference = reference, query = ctrl, dims = 1:30)
# 使用TransferData函数进行标签转移的细胞类型预测
predictions <- TransferData(anchorset = transfer.anchors, refdata = reference$cell_type, 
    dims = 1:30)

# 将细胞类型预测的结果添加到metadata中
ctrl <- AddMetaData(object = ctrl, metadata = predictions)

DimPlot(ctrl, group.by = "predicted.id", label = T, repel = T) + NoAxes()
image.png

Now plot how many cells of each celltypes can be found in each cluster.

ggplot(ctrl@meta.data, aes(x = CCA_snn_res.0.5, fill = predicted.id)) + geom_bar() + theme_classic()
image.png

使用scPred预测细胞类型

scPred will train a classifier based on all principal components. First, getFeatureSpace will create a scPred object stored in the @misc slot where it extracts the PCs that best separates the different celltypes. Then trainModel will do the actual training for each celltype.

reference <- getFeatureSpace(reference, "cell_type")

# ●  Extracting feature space for each cell type...
## DONE!

reference <- trainModel(reference)
## ●  Training models for each cell type...
## maximum number of iterations reached 0.000116588 -0.0001156614DONE!

We can then print how well the training worked for the different celltypes by printing the number of PCs used for each, the ROC value and Sensitivity/Specificity.

get_scpred(reference)
## 'scPred' object
## ✔  Prediction variable = cell_type 
## ✔  Discriminant features per cell type
## ✔  Training model(s)
## Summary
## 
## |Cell type   |    n| Features|Method    |   ROC|  Sens|  Spec|
## |:-----------|----:|--------:|:---------|-----:|-----:|-----:|
## |B cell      |  280|       50|svmRadial | 1.000| 0.964| 1.000|
## |CD4 T cell  | 1620|       50|svmRadial | 0.997| 0.971| 0.975|
## |CD8 T cell  |  945|       50|svmRadial | 0.985| 0.902| 0.978|
## |cDC         |   26|       50|svmRadial | 0.995| 0.547| 1.000|
## |cMono       |  212|       50|svmRadial | 0.994| 0.958| 0.970|
## |ncMono      |   79|       50|svmRadial | 0.998| 0.582| 1.000|
## |NK cell     |  312|       50|svmRadial | 0.999| 0.936| 0.996|
## |pDC         |   20|       50|svmRadial | 1.000| 0.700| 1.000|
## |Plasma cell |    6|       50|svmRadial | 1.000| 0.800| 1.000|

我们可以通过更改参数和测试不同类型的模型来优化每个数据集的参数,有关更多信息,请访问:https : //powellgenomicslab.github.io/scPred/articles/introduction.html。但是目前,我们将继续使用该模型进行细胞类型预测。

接下来,我们可以根据训练好的数据集来预测细胞类型,其中scPred会基于Harmony将两个数据集对齐,然后执行分类。

ctrl <- scPredict(ctrl, reference)
## ●  Matching reference with new dataset...
##   ─ 2000 features present in reference loadings
##   ─ 1774 features shared between reference and new dataset
##   ─ 88.7% of features in the reference are present in new dataset
## ●  Aligning new data to reference...
## ●  Classifying cells...
## DONE!

DimPlot(ctrl, group.by = "scpred_prediction", label = T, repel = T) + NoAxes()
image.png

Now plot how many cells of each celltypes can be found in each cluster.

ggplot(ctrl@meta.data, aes(x = CCA_snn_res.0.5, fill = scpred_prediction)) + geom_bar() + 
    theme_classic()
image.png

比较不同预测方法的结果

现在,我们将使用scPred包中的crossTab 函数方便的比较这两种分类方法的结果,该函数会得到两种分类结果中的交集。

crossTab(ctrl, "predicted.id", "scpred_prediction")
image.png

使用GSEA富集分析预测细胞类型

对于细胞簇水平上的细胞类型预测,我们还可以使用针对特定细胞类型标记基因的GSEA富集预测。类似于我们在差异表达分析中对DEG进行功能富集的方式。有一些可用的细胞类型基因集的数据库,如CellMarkerPanglaoDBMSigDB。我们还可以查看参考数据集中的DEG与您正在分析的数据集之间的重叠。

使用重叠的DEGs进行基因集的富集

首先,我们提取Covid-19数据集和参考数据集中的top DEGs。

# run differential expression in our dataset, using clustering at resolution 0.3
alldata <- SetIdent(alldata, value = "CCA_snn_res.0.5")

DGE_table <- FindAllMarkers(alldata, logfc.threshold = 0, test.use = "wilcox", min.pct = 0.1, 
    min.diff.pct = 0, only.pos = TRUE, max.cells.per.ident = 20, return.thresh = 1, 
    assay = "RNA")

# split into a list
DGE_list <- split(DGE_table, DGE_table$cluster)

unlist(lapply(DGE_list, nrow))
##    0    1    2    3    4    5    6    7    8    9   10 
## 3153 2483 3394 2837 2573 3956 2150 3753 2465 2142 3342

# Compute differential gene expression in reference dataset (that has cell annotation)
reference <- SetIdent(reference, value = "cell_type")

reference_markers <- FindAllMarkers(reference, min.pct = 0.1, min.diff.pct = 0.2, 
    only.pos = T, max.cells.per.ident = 20, return.thresh = 1)

# Identify the top cell marker genes in reference dataset select top 50 with highest foldchange among top 100 signifcant genes.
reference_markers <- reference_markers[order(reference_markers$avg_logFC, decreasing = T), ]
top50_cell_selection <- reference_markers %>% group_by(cluster) %>% top_n(-100, p_val) %>% 
    top_n(50, avg_logFC)

# Transform the markers into a list
ref_list = split(top50_cell_selection$gene, top50_cell_selection$cluster)

unlist(lapply(ref_list, length))
##  CD8 T cell  CD4 T cell       cMono      B cell     NK cell         pDC 
##          30          14          50          50          50          50 
##      ncMono         cDC Plasma cell 
##          50          50          50

接下来,我们基于这些细胞类型特异的DEGs进行GSEA富集分析,并检查DEGs在参考数据集中的富集程度。

suppressPackageStartupMessages(library(fgsea))

# run fgsea for each of the clusters in the list
res <- lapply(DGE_list, function(x) {
    gene_rank <- setNames(x$avg_logFC, x$gene)
    fgseaRes <- fgsea(pathways = ref_list, stats = gene_rank, nperm = 10000)
    return(fgseaRes)
})
names(res) <- names(DGE_list)

# You can filter and resort the table based on ES, NES or pvalue
res <- lapply(res, function(x) {
    x[x$pval < 0.1, ]
})
res <- lapply(res, function(x) {
    x[x$size > 2, ]
})
res <- lapply(res, function(x) {
    x[order(x$NES, decreasing = T), ]
})
res
## $`0`
##    pathway         pval        padj        ES      NES nMoreExtreme size
## 1:   cMono 0.0000999900 0.000299970 0.9588744 2.095372            0   48
## 2:  ncMono 0.0000999900 0.000299970 0.8410417 1.833205            0   46
## 3:     cDC 0.0000999900 0.000299970 0.8160502 1.772541            0   43
## 4:     pDC 0.0005017561 0.001128951 0.7652807 1.584164            4   21
## 5:  B cell 0.0069809794 0.012565763 0.7410824 1.493208           68   15
## 6: NK cell 0.0150437919 0.022565688 0.7579453 1.475439          145   11
##                                     leadingEdge
## 1:      S100A8,S100A9,LYZ,S100A12,VCAN,FCN1,...
## 2:     CTSS,TYMP,CST3,S100A11,AIF1,SERPINA1,...
## 3:              LYZ,GRN,TYMP,CST3,AIF1,CPVL,...
## 4:         GRN,MS4A6A,CST3,MPEG1,CTSB,TGFBI,...
## 5: NCF1,LY86,MARCH1,HLA-DRB5,POU2F2,PHACTR1,...
## 6:       TYROBP,FCER1G,SRGN,CCL3,CD63,MYO1F,...
## 
## $`1`
##        pathway         pval        padj        ES      NES nMoreExtreme size
## 1:      B cell 0.0000999900 0.000408455 0.8973600 1.988512            0   46
## 2:         cDC 0.0001021138 0.000408455 0.8750203 1.778727            0   14
## 3:         pDC 0.0011068625 0.002951633 0.7719359 1.612224           10   18
## 4: Plasma cell 0.0631477722 0.101036436 0.7333284 1.389860          590    8
## 5:      ncMono 0.0913272011 0.121769601 0.8427419 1.344760          694    3
##                                             leadingEdge
## 1:      CD79A,TCL1A,LINC00926,MS4A1,CD79B,TNFRSF13C,...
## 2: CD74,HLA-DQB1,HLA-DRA,HLA-DPB1,HLA-DRB1,HLA-DQA1,...
## 3:               CD74,TCF4,BCL11A,IRF8,HERPUD1,SPIB,...
## 4:                PLPP5,ISG20,HERPUD1,MZB1,ITM2C,JCHAIN
## 5:                                  HLA-DPA1,POU2F2,LYN
## 
## $`2`
##       pathway         pval         padj        ES      NES nMoreExtreme size
## 1: CD8 T cell 0.0001001603 0.0003505609 0.9432365 2.276503            0   29
## 2:    NK cell 0.0001000801 0.0003505609 0.8661551 2.108346            0   32
## 3: CD4 T cell 0.0007128431 0.0016633005 0.9256514 1.721886            5    5
## 4:        pDC 0.0398826342 0.0697946099 0.7225340 1.474840          366    8
##                            leadingEdge
## 1:   GZMH,CD8A,CD3D,CD3G,CD8B,CCL5,...
## 2: CCL5,NKG7,GZMA,FGFBP2,CCL4,GZMM,...
## 3:                      CD3G,CD3E,IL7R
## 4:               C12orf75,GZMB,SELENOS
## 
## $`3`
##       pathway         pval         padj        ES      NES nMoreExtreme size
## 1: CD8 T cell 0.0001002205 0.0003006615 0.9553147 2.173136            0   25
## 2:    NK cell 0.0001001302 0.0003006615 0.8360990 1.915575            0   27
## 3: CD4 T cell 0.0026475455 0.0052950910 0.8663678 1.674383           23    7
##                           leadingEdge
## 1: DUSP2,CCL5,CD3D,LYAR,CD8A,CD3E,...
## 2: CCL5,KLRB1,GZMM,CMC1,CST7,GZMA,...
## 3:        CD3E,CD3G,IL7R,PIK3IP1,TCF7
## 
## $`4`
##    pathway         pval       padj        ES      NES nMoreExtreme size
## 1:   cMono 0.0005000500 0.00270081 0.7200941 1.556501            4   45
## 2:  ncMono 0.0006001801 0.00270081 0.7215360 1.545517            5   38
## 3:  B cell 0.0191673788 0.04312660 0.9013244 1.505237          156    4
## 4:     cDC 0.0049014704 0.01470441 0.6806718 1.448777           48   34
## 5:     pDC 0.0843953838 0.15191169 0.6245503 1.295140          840   22
##                                 leadingEdge
## 1:   CST3,FCER1G,COTL1,LYZ,STXBP2,AP1S2,...
## 2: OAZ1,TIMP1,CST3,FKBP1A,IFITM3,FCER1G,...
## 3:                     PDLIM1,HLA-DRB5,NCF1
## 4: GAPDH,CST3,FCER1G,COTL1,LYZ,HLA-DRB5,...
## 5:       PTCRA,CST3,TXN,CTSB,APP,MS4A6A,...
## 
## $`5`
##        pathway         pval         padj        ES      NES nMoreExtreme size
## 1:     NK cell 0.0000999900 0.0004019697 0.9377894 2.439144            0   50
## 2:  CD8 T cell 0.0001004924 0.0004019697 0.9138145 2.225902            0   23
## 3:         pDC 0.0018019928 0.0048053141 0.8029287 1.747492           16   10
## 4:      ncMono 0.0115903265 0.0231806531 0.7906541 1.605347          103    7
## 5: Plasma cell 0.0228548516 0.0365677626 0.5842785 1.450093          227   28
##                                  leadingEdge
## 1:        SPON2,GNLY,PRF1,GZMB,CD7,CLIC3,...
## 2:       GNLY,PRF1,GZMB,NKG7,FGFBP2,CTSW,...
## 3: GZMB,C12orf75,RRBP1,PLAC8,ALOX5AP,HSP90B1
## 4:                   FCGR3A,IFITM2,RHOC,HES4
## 5:  CD38,FKBP11,SLAMF7,SDF2L1,PRDM1,PPIB,...
## 
## $`6`
##       pathway         pval         padj        ES      NES nMoreExtreme size
## 1: CD4 T cell 0.0001020616 0.0006123699 0.9154707 1.793599            0   13
## 2: CD8 T cell 0.0011968230 0.0035904689 0.8968754 1.622288           10    7
##                          leadingEdge
## 1: IL7R,LTB,LDHB,RCAN3,MAL,NOSIP,...
## 2:      IL32,CD3E,CD3D,CD2,CD3G,CD8B
## 
## $`7`
##        pathway         pval         padj        ES      NES nMoreExtreme size
## 1:     NK cell 0.0000999900 0.0004013646 0.9150850 2.398852            0   46
## 2:  CD8 T cell 0.0001003412 0.0004013646 0.9281782 2.318084            0   26
## 3:      ncMono 0.0024107450 0.0064286534 0.8617933 1.703177           20    6
## 4:         pDC 0.0130282809 0.0260565618 0.7292142 1.595377          122   10
## 5: Plasma cell 0.0451813264 0.0722901222 0.5447121 1.375779          450   29
##                                   leadingEdge
## 1:        FGFBP2,GNLY,NKG7,CST7,GZMB,CTSW,...
## 2:        FGFBP2,GNLY,NKG7,CST7,GZMB,CTSW,...
## 3:                         FCGR3A,IFITM2,RHOC
## 4:  GZMB,C12orf75,HSP90B1,ALOX5AP,RRBP1,PLAC8
## 5: PRDM1,FKBP11,HSP90B1,PPIB,SPCS2,SDF2L1,...
## 
## $`8`
##        pathway         pval         padj        ES      NES nMoreExtreme size
## 1:      B cell 0.0000999900 0.0004571312 0.8983372 1.896468            0   45
## 2:         cDC 0.0001015847 0.0004571312 0.8787620 1.717309            0   14
## 3:         pDC 0.0005035247 0.0015105740 0.8235666 1.638689            4   17
## 4: Plasma cell 0.0116822430 0.0175233645 0.7578645 1.481047          114   14
##                                             leadingEdge
## 1:        CD79A,MS4A1,BANK1,CD74,TNFRSF13C,HLA-DQA1,...
## 2: CD74,HLA-DQA1,HLA-DRA,HLA-DPB1,HLA-DQB1,HLA-DPA1,...
## 3:             CD74,JCHAIN,SPIB,HERPUD1,TCF4,CCDC50,...
## 4:                JCHAIN,HERPUD1,ISG20,ITM2C,PEBP1,MZB1
## 
## $`9`
##       pathway         pval         padj        ES      NES nMoreExtreme size
## 1: CD4 T cell 0.0001023227 0.0006139364 0.9248473 1.981064            0   13
## 2: CD8 T cell 0.0668711656 0.1587804395 0.7936868 1.374165          544    4
##                             leadingEdge
## 1: IL7R,TCF7,PIK3IP1,TSHZ2,LTB,LEF1,...
## 2:                   CD3E,CD3G,CD3D,CD2
## 
## $`10`
##    pathway       pval         padj        ES      NES nMoreExtreme size
## 1:  ncMono 0.00009999 0.0002666667 0.9578305 2.052171            0   49
## 2:   cMono 0.00010000 0.0002666667 0.8907972 1.877527            0   35
## 3:     cDC 0.00009999 0.0002666667 0.8272454 1.750653            0   38
## 4: NK cell 0.00255050 0.0051009998 0.8054483 1.571513           24   13
## 5:     pDC 0.04759980 0.0761596766 0.6792488 1.351779          470   16
## 6:  B cell 0.07367357 0.0945474534 0.6570759 1.307652          728   16
##                                               leadingEdge
## 1:                CDKN1C,LST1,FCGR3A,AIF1,COTL1,MS4A7,...
## 2:               LST1,AIF1,COTL1,SERPINA1,FCER1G,PSAP,...
## 3:                   LST1,AIF1,COTL1,FCER1G,CST3,SPI1,...
## 4:             FCGR3A,FCER1G,RHOC,TYROBP,IFITM2,MYO1F,...
## 5:                   CST3,NPC2,PLD4,MPEG1,VAMP8,TGFBI,...
## 6: HLA-DPA1,POU2F2,HLA-DRB5,HLA-DRA,HLA-DPB1,HLA-DRB1,...

现在,我们可以根据每个细胞簇富集的最优结果对它们进行重命名。OBS!请注意,如果某些cluster富集到的所有基因集的p值都不好,那么这个预测的结果将会不太可靠。同样,如果我们使用的基因集无法涵盖所有的细胞类型,那么预测的结果可能只是最相似的细胞类型。

new.cluster.ids <- unlist(lapply(res, function(x) {
    as.data.frame(x)[1, 1]
}))

alldata$ref_gsea <- new.cluster.ids[as.character(alldata@active.ident)]

cowplot::plot_grid(ncol = 2, DimPlot(alldata, label = T, group.by = "CCA_snn_res.0.5") + 
    NoAxes(), DimPlot(alldata, label = T, group.by = "ref_gsea") + NoAxes())
image.png

将富集预测的结果与ctrl_13样本中的其他细胞类型预测方法进行比较。

ctrl$ref_gsea = alldata$ref_gsea[alldata$orig.ident == "ctrl_13"]

cowplot::plot_grid(ncol = 3, DimPlot(ctrl, label = T, group.by = "ref_gsea") + NoAxes() + 
    ggtitle("GSEA"), DimPlot(ctrl, label = T, group.by = "predicted.id") + NoAxes() + 
    ggtitle("LabelTransfer"), DimPlot(ctrl, label = T, group.by = "scpred_prediction") + 
    NoAxes() + ggtitle("scPred"))
image.png

使用已知的带有注释的基因集进行富集

首先,我们从CellMarker数据库下载特定细胞类型的基因集。

# Download gene marker list
if (!dir.exists("data/CellMarker_list/")) {
    dir.create("data/CellMarker_list")
    download.file(url = "http://bio-bigdata.hrbmu.edu.cn/CellMarker/download/Human_cell_markers.txt", 
        destfile = "./data/CellMarker_list/Human_cell_markers.txt")
    download.file(url = "http://bio-bigdata.hrbmu.edu.cn/CellMarker/download/Mouse_cell_markers.txt", 
        destfile = "./data/CellMarker_list/Mouse_cell_markers.txt")
}

读入基因集列表,并做初步的筛选。

# Load the human marker table
markers <- read.delim("data/CellMarker_list/Human_cell_markers.txt")
markers <- markers[markers$speciesType == "Human", ]
markers <- markers[markers$cancerType == "Normal", ]

# Filter by tissue (to reduce computational time and have tissue-specific
# classification) sort(unique(markers$tissueType))
# grep('blood',unique(markers$tissueType),value = T) markers <- markers [
# markers$tissueType %in% c('Blood','Venous blood', 'Serum','Plasma',
# 'Spleen','Bone marrow','Lymph node'), ]

# remove strange characters etc.
celltype_list <- lapply(unique(markers$cellName), function(x) {
    x <- paste(markers$geneSymbol[markers$cellName == x], sep = ",")
    x <- gsub("[[]|[]]| |-", ",", x)
    x <- unlist(strsplit(x, split = ","))
    x <- unique(x[!x %in% c("", "NA", "family")])
    x <- casefold(x, upper = T)
})
names(celltype_list) <- unique(markers$cellName)
# celltype_list <- lapply(celltype_list , function(x) {x[1:min(length(x),50)]} )
celltype_list <- celltype_list[unlist(lapply(celltype_list, length)) < 100]
celltype_list <- celltype_list[unlist(lapply(celltype_list, length)) > 5]

对已知的基因集进行GSEA富集分析。

# run fgsea for each of the clusters in the list
res <- lapply(DGE_list, function(x) {
    gene_rank <- setNames(x$avg_logFC, x$gene)
    fgseaRes <- fgsea(pathways = celltype_list, stats = gene_rank, nperm = 10000)
    return(fgseaRes)
})
names(res) <- names(DGE_list)

# You can filter and resort the table based on ES, NES or pvalue
res <- lapply(res, function(x) {
    x[x$pval < 0.01, ]
})
res <- lapply(res, function(x) {
    x[x$size > 5, ]
})
res <- lapply(res, function(x) {
    x[order(x$NES, decreasing = T), ]
})

# show top 3 for each cluster.
lapply(res, head, 3)
## $`0`
##                   pathway         pval       padj        ES      NES
## 1:             Neutrophil 0.0000999900 0.00427395 0.8225470 1.803093
## 2:             Fibroblast 0.0001042427 0.00427395 0.8978804 1.725605
## 3: CD1C+_B dendritic cell 0.0000999900 0.00427395 0.7846113 1.711199
##    nMoreExtreme size                              leadingEdge
## 1:            0   55 S100A8,S100A9,S100A12,CD14,MNDA,G0S2,...
## 2:            0   10        CD14,VIM,CD36,CKAP4,LRP1,CD44,...
## 3:            0   49  S100A8,S100A9,LYZ,S100A12,VCAN,FCN1,...
## 
## $`1`
##                         pathway         pval       padj        ES      NES
## 1:            Follicular B cell 0.0001026905 0.01314438 0.8905277 1.776479
## 2: Megakaryocyte erythroid cell 0.0020831163 0.03662375 0.8307946 1.620952
##    nMoreExtreme size                         leadingEdge
## 1:            0   12 MS4A1,CD69,FCER2,CD22,CD40,PAX5,...
## 2:           19   10     CD79A,CD83,CD69,FCER2,LY9,CXCR5
## 
## $`2`
##                         pathway         pval         padj        ES      NES
## 1:        CD4+ cytotoxic T cell 0.0000999900 0.0008897005 0.8208075 2.102096
## 2: Megakaryocyte erythroid cell 0.0001005126 0.0008897005 0.8803002 2.071241
## 3:                  CD8+ T cell 0.0001055186 0.0008897005 0.9672513 2.058873
##    nMoreExtreme size                          leadingEdge
## 1:            0   65 GZMH,CCL5,NKG7,KLRG1,GZMA,FGFBP2,...
## 2:            0   22    CD8A,CD3D,CD3G,CD2,CD3E,KLRG1,...
## 3:            0   10     CD8A,CD3D,CD3G,CD8B,CD2,CD3E,...
## 
## $`3`
##                         pathway         pval        padj        ES      NES
## 1:                  CD8+ T cell 0.0001023332 0.001457502 0.9558597 2.023154
## 2: Megakaryocyte erythroid cell 0.0001000801 0.001457502 0.8371009 1.911415
## 3:                T helper cell 0.0001012863 0.001457502 0.8699055 1.893383
##    nMoreExtreme size                         leadingEdge
## 1:            0   13   GZMK,CD3D,CD8A,CD3E,CD3G,CD8B,...
## 2:            0   27 CD3D,CD8A,KLRB1,CD3E,KLRG1,CD3G,...
## 3:            0   16  GZMK,CD3D,KLRB1,CD3E,CD3G,IL7R,...
## 
## $`4`
##                   pathway         pval       padj        ES      NES
## 1:          Megakaryocyte 0.0001033058 0.01580579 0.9165646 1.785300
## 2: Circulating fetal cell 0.0059165346 0.24682396 0.8357811 1.589161
## 3:               Platelet 0.0080661424 0.24682396 0.7460329 1.521198
##    nMoreExtreme size                         leadingEdge
## 1:            0   11     PPBP,PF4,GP9,ITGA2B,CD9,RASGRP2
## 2:           55    9                   PF4,CD9,ACTB,CD68
## 3:           79   18 GP9,ITGA2B,CD9,CD151,CD63,ICAM2,...
## 
## $`5`
##                              pathway         pval        padj        ES
## 1:             CD4+ cytotoxic T cell 0.0000999900 0.002568843 0.8537503
## 2: Effector CD8+ memory T (Tem) cell 0.0000999900 0.002568843 0.8218874
## 3:      Megakaryocyte erythroid cell 0.0001004924 0.002568843 0.8644641
##         NES nMoreExtreme size                            leadingEdge
## 1: 2.269733            0   71    SPON2,GNLY,PRF1,PTGDS,GZMB,NKG7,...
## 2: 2.166913            0   62 SPON2,GNLY,GZMB,FGFBP2,KLRF1,KLRD1,...
## 3: 2.102014            0   23    GZMB,CD7,KLRB1,KLRD1,IL2RB,GZMA,...
## 
## $`6`
##             pathway         pval        padj        ES      NES nMoreExtreme
## 1:      CD4+ T cell 0.0001014816 0.005206526 0.8985023 1.769758            0
## 2: Activated T cell 0.0004203005 0.008105796 0.8828033 1.651863            3
## 3:      CD8+ T cell 0.0003074085 0.006916692 0.8470356 1.641751            2
##    size                          leadingEdge
## 1:   14   IL7R,LTB,CD3E,CD3D,CD5,TNFRSF4,...
## 2:    9 CD3E,CD3D,TNFRSF4,CD3G,CD28,CD27,...
## 3:   12      IL7R,CD3E,CD3D,CD5,CD2,CD3G,...
## 
## $`7`
##                              pathway       pval        padj        ES      NES
## 1:             CD4+ cytotoxic T cell 0.00009999 0.002383736 0.8720059 2.357424
## 2: Effector CD8+ memory T (Tem) cell 0.00009999 0.002383736 0.8328810 2.242414
## 3:               Natural killer cell 0.00010002 0.002383736 0.8444422 2.197239
##    nMoreExtreme size                           leadingEdge
## 1:            0   74   FGFBP2,GNLY,NKG7,CST7,GZMB,CTSW,...
## 2:            0   68 FGFBP2,GNLY,GZMB,GZMH,KLRF1,KLRD1,...
## 3:            0   41   GNLY,NKG7,GZMB,KLRF1,KLRD1,GZMA,...
## 
## $`8`
##              pathway        pval      padj        ES      NES nMoreExtreme size
## 1: Follicular B cell 0.009812022 0.1706815 0.7917685 1.518359           94   11
##                           leadingEdge
## 1: MS4A1,CD24,CD40,CD22,PAX5,EBF1,...
## 
## $`9`
##                  pathway         pval        padj        ES      NES
## 1:     Naive CD8+ T cell 0.0000999900 0.006053027 0.7754973 1.914284
## 2:     Naive CD4+ T cell 0.0001000500 0.006053027 0.8116887 1.887479
## 3: Central memory T cell 0.0007809885 0.019694812 0.9093896 1.701239
##    nMoreExtreme size                             leadingEdge
## 1:            0   73 CCR7,TCF7,PIK3IP1,LEF1,TRABD2A,LDHB,...
## 2:            0   29    CCR7,IL7R,TCF7,TSHZ2,TRABD2A,MAL,...
## 3:            6    6                     CCR7,IL7R,CD28,CD27
## 
## $`10`
##                         pathway         pval        padj        ES      NES
## 1: Megakaryocyte erythroid cell 0.0001011122 0.008190091 0.8301819 1.653260
## 2:                 Myeloid cell 0.0001002406 0.008190091 0.7868515 1.611696
## 3:                Lymphoid cell 0.0016443988 0.053278520 0.8249257 1.598846
##    nMoreExtreme size                            leadingEdge
## 1:            0   16  FCGR3A,PECAM1,CD68,ITGAX,SPN,CD86,...
## 2:            0   23 FCGR3A,CSF1R,PECAM1,CD68,ITGAX,SPN,...
## 3:           15   12              FCGR3A,CD68,ITGAX,SPN,CD4

可视化基因集富集分析注释到的细胞类型。

new.cluster.ids <- unlist(lapply(res, function(x) {
    as.data.frame(x)[1, 1]
}))
alldata$cellmarker_gsea <- new.cluster.ids[as.character(alldata@active.ident)]

cowplot::plot_grid(ncol = 2, DimPlot(alldata, label = T, group.by = "ref_gsea") + 
    NoAxes(), DimPlot(alldata, label = T, group.by = "cellmarker_gsea") + NoAxes())
image.png

基于以上分析,您认为这些预测方法可以很好的重叠吗?您在哪里看到最多的不一致之处?

在这种情况下,我们没有任何事实依据,也不能说“夹心法”效果最好。我们应该牢记,任何细胞类型分类方法都只是一种预测,我们仍然需要利用已有的生物学知识来判断预测结果是否有意义。

保存细胞类型预测的结果

saveRDS(ctrl, "data/results/ctrl13_qc_dr_int_cl_celltype.rds")
sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Catalina 10.15.5
## 
## Matrix products: default
## BLAS/LAPACK: /Users/paulo.czarnewski/.conda/envs/scRNAseq2021/lib/libopenblasp-r0.3.12.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] fgsea_1.16.0    caret_6.0-86    lattice_0.20-41 scPred_1.9.0   
##  [5] rafalib_1.0.0   pheatmap_1.0.12 ggplot2_3.3.3   cowplot_1.1.1  
##  [9] dplyr_1.0.3     venn_1.9        Seurat_3.2.3    RJSONIO_1.3-1.4
## [13] optparse_1.6.6 
## 
## loaded via a namespace (and not attached):
##   [1] fastmatch_1.1-0       plyr_1.8.6            igraph_1.2.6         
##   [4] lazyeval_0.2.2        splines_4.0.3         BiocParallel_1.24.0  
##   [7] listenv_0.8.0         scattermore_0.7       digest_0.6.27        
##  [10] foreach_1.5.1         htmltools_0.5.1       fansi_0.4.2          
##  [13] magrittr_2.0.1        tensor_1.5            cluster_2.1.0        
##  [16] ROCR_1.0-11           limma_3.46.0          recipes_0.1.15       
##  [19] globals_0.14.0        gower_0.2.2           matrixStats_0.57.0   
##  [22] colorspace_2.0-0      ggrepel_0.9.1         xfun_0.20            
##  [25] crayon_1.3.4          jsonlite_1.7.2        spatstat_1.64-1      
##  [28] spatstat.data_1.7-0   survival_3.2-7        zoo_1.8-8            
##  [31] iterators_1.0.13      glue_1.4.2            polyclip_1.10-0      
##  [34] gtable_0.3.0          ipred_0.9-9           leiden_0.3.6         
##  [37] kernlab_0.9-29        future.apply_1.7.0    abind_1.4-5          
##  [40] scales_1.1.1          DBI_1.1.1             miniUI_0.1.1.1       
##  [43] Rcpp_1.0.6            viridisLite_0.3.0     xtable_1.8-4         
##  [46] reticulate_1.18       rsvd_1.0.3            stats4_4.0.3         
##  [49] lava_1.6.8.1          prodlim_2019.11.13    htmlwidgets_1.5.3    
##  [52] httr_1.4.2            getopt_1.20.3         RColorBrewer_1.1-2   
##  [55] ellipsis_0.3.1        ica_1.0-2             pkgconfig_2.0.3      
##  [58] farver_2.0.3          nnet_7.3-14           uwot_0.1.10          
##  [61] deldir_0.2-9          tidyselect_1.1.0      labeling_0.4.2       
##  [64] rlang_0.4.10          reshape2_1.4.4        later_1.1.0.1        
##  [67] munsell_0.5.0         tools_4.0.3           cli_2.2.0            
##  [70] generics_0.1.0        ggridges_0.5.3        evaluate_0.14        
##  [73] stringr_1.4.0         fastmap_1.0.1         yaml_2.2.1           
##  [76] goftest_1.2-2         ModelMetrics_1.2.2.2  knitr_1.30           
##  [79] fitdistrplus_1.1-3    admisc_0.11           purrr_0.3.4          
##  [82] RANN_2.6.1            pbapply_1.4-3         future_1.21.0        
##  [85] nlme_3.1-151          mime_0.9              formatR_1.7          
##  [88] compiler_4.0.3        beeswarm_0.2.3        plotly_4.9.3         
##  [91] png_0.1-7             spatstat.utils_1.20-2 tibble_3.0.5         
##  [94] stringi_1.5.3         highr_0.8             RSpectra_0.16-0      
##  [97] Matrix_1.3-2          vctrs_0.3.6           pillar_1.4.7         
## [100] lifecycle_0.2.0       lmtest_0.9-38         RcppAnnoy_0.0.18     
## [103] data.table_1.13.6     irlba_2.3.3           httpuv_1.5.5         
## [106] patchwork_1.1.1       R6_2.5.0              promises_1.1.1       
## [109] KernSmooth_2.23-18    gridExtra_2.3         vipor_0.4.5          
## [112] parallelly_1.23.0     codetools_0.2-18      MASS_7.3-53          
## [115] assertthat_0.2.1      withr_2.4.0           sctransform_0.3.2    
## [118] harmony_1.0           mgcv_1.8-33           parallel_4.0.3       
## [121] grid_4.0.3            rpart_4.1-15          timeDate_3043.102    
## [124] tidyr_1.1.2           class_7.3-17          rmarkdown_2.6        
## [127] Rtsne_0.15            pROC_1.17.0.1         shiny_1.5.0          
## [130] lubridate_1.7.9.2     ggbeeswarm_0.6.0

参考来源:https://nbisweden.github.io/workshop-scRNAseq/labs/compiled/seurat/seurat_06_celltype.html

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 215,245评论 6 497
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,749评论 3 391
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 160,960评论 0 350
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,575评论 1 288
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,668评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,670评论 1 294
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,664评论 3 415
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,422评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,864评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,178评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,340评论 1 344
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,015评论 5 340
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,646评论 3 323
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,265评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,494评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,261评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,206评论 2 352

推荐阅读更多精彩内容