【单细胞组学】细胞表面蛋白辅助细胞分群

导言

我们之前介绍了CITE-seq的测序原理和使用CITE-seq-Count工具处理reads数据得到蛋白表达矩阵。

CITE-seq-Count输出文件结构如下：

OUTFOLDER/
-- umi_count/
-- -- matrix.mtx.gz
-- -- features.tsv.gz
-- -- barcodes.tsv.gz
-- read_count/
-- -- matrix.mtx.gz
-- -- features.tsv.gz
-- -- barcodes.tsv.gz
-- unmapped.csv
-- run_report.yaml

跟处理scRNA-seq数据的 cell ranger工具的输出结果有些类似。最后需要用到的是其中umi_count文件夹下的matrix.mtx.gz，features.tsv.gz，barcodes.tsv.gz这三个文件。

因为mRNA 表达和细胞表面蛋白表达是配套的数据，也就是每个细胞都得到了其mRNA全转录组和部分细胞表面蛋白的表达水平，那么分析这两个数据就有两种思路：

仅基于scRNA-seq 数据对细胞进行聚类、分群、降维等标准流程，然后将细胞表面蛋白表达水平（ADT值）作为附加信息进行可视化等分析[1]。这种分析方法跟常规的单细胞转录组数据分析流程没有太大差别。
同时基于scRNA-seq数据和细胞表面蛋白表达数据（ADT值）对细胞进行聚类、分群、降维等分析[2]。这种方法需要使用到weighted-nearest neighbor’ (WNN)等多模型数据整合算法来对数据进行整合。

WNN算法是一个无监督的框架，用于了解每个细胞中每种数据类型的相对效用，从而能够对多种数据类型进行综合分析。

Seurat官方示例教程中有提到，对30672个同时测了mRNA和25个细胞表面蛋白的来自骨髓的单细胞数据分析结果显示，细胞表面蛋白可以帮助一些免疫细胞状态的细分：

We find that the RNA analysis is more informative than the ADT analysis in identifying progenitor states (the ADT panel contains markers for differentiated cells), while the converse is true of T cell states (where the ADT analysis outperforms RNA).

对该骨髓来源来的免疫细胞数据，mRNA表达数据能更好的区分祖细胞的状态，细胞表面蛋白表达数据则能更好的区分T细胞的状态。当然这有可能是因为该数据检测的细胞表面蛋白抗体panel里，包括了针对分化细胞的marker而没有包括针对祖细胞状态的marker，毕竟它只包含了25个蛋白。

鉴于Seurat的示例数据里细胞表面蛋白的检测panel都比较小，为了验证细胞表面蛋白表达数据在帮助各种细胞类型细分时的效果，我们用之前提到的那篇文献中的BRCA数据进行分析，该研究测了118个细胞表面蛋白panel，有来自4个样本共7363个单细胞的scCITE-seq数据。

Seurat包分析CITE-seq数据

library(Seurat)
library(ggplot2)
library(patchwork)

###
adt=Read10X('./umi_count/',gene.column = 1)
med<-read.csv('./metadata.csv',row.names = 1)
counts <- Read10X(data.dir = path,gene.column=1)

brca.cite.med=med[med$cell_id%in%colnames(adt),]
table(brca.cite.med$celltype_major)
# B-cells              CAFs Cancer Epithelial       Endothelial           Myeloid 
# 631               686                 7               549              1000 
# Plasmablasts               PVL           T-cells 
# 151               972              3374 

#可以看到肿瘤上皮细胞非常少。为了方面后续分析，这里去掉这些肿瘤上皮细胞
brca.cite.med=brca.cite.med[brca.cite.med$celltype_major!='Cancer Epithelial',]
brca.cite.rna=counts[,rownames(brca.cite.med)]
brca.cite.adt=adt[,rownames(brca.cite.med)]

##
cols_major= RColorBrewer::brewer.pal(n=7,name='Paired')
names(cols_major)=unique(brca.cite.med$celltype_major)

###########################################################
### Define cellular states based on only mRNA expression ##
############################################################

###Setup a Seurat object, add the RNA and protein data
brca.cite <- CreateSeuratObject(counts = brca.cite.rna,
                                meta.data =brca.cite.med )
adt_assay <- CreateAssayObject(counts = brca.cite.adt)
# add this assay to the previously created Seurat object
brca.cite[["ADT"]] <- adt_assay

###Cluster cells on the basis of their scRNA-seq profiles
DefaultAssay(brca.cite) <- "RNA"
# perform visualization and clustering steps
brca.cite <- NormalizeData(brca.cite)
brca.cite <- FindVariableFeatures(brca.cite)
brca.cite <- ScaleData(brca.cite)
brca.cite <- RunPCA(brca.cite, verbose = FALSE)

# 因为我们已经有细胞的分群信息，所以不再进行聚类和分群
# brca.cite <- FindNeighbors(brca.cite, dims = 1:30)
# brca.cite <- FindClusters(brca.cite, resolution = 0.8, verbose = FALSE)
brca.cite <- RunUMAP(brca.cite,reduction = "pca", dims = 1:30) 

p1=DimPlot(brca.cite,group.by = 'orig.ident')+NoLegend()
p2=DimPlot(brca.cite,group.by = 'celltype_major', cols = cols_major,label = T,label.size = 5,repel = T)
p1|p2

###Normalize ADT data,
DefaultAssay(brca.cite) <- "ADT"
brca.cite <- NormalizeData(brca.cite, normalization.method = "CLR", margin = 2)
DefaultAssay(brca.cite) <- "RNA"

###Identify cell surface markers for scRNA-seq clusters
Idents(brca.cite)=brca.cite$celltype_major
adt_markers <- FindAllMarkers(brca.cite, assay = "ADT")
rna_markers <- FindAllMarkers(brca.cite, assay = "RNA")

###查看一些基质细胞和免疫细胞标志基因的蛋白和mRNA的表达：
p1 <- FeaturePlot(brca.cite, c("adt_CD45",'adt_CD31','adt_CD19',"adt_MCAM"),slot='data', cols = viridis(15),ncol = 4)
p2 <- FeaturePlot(brca.cite, c("rna_PTPRC",'rna_PECAM1','rna_CD19',"rna_MCAM"),slot='data', cols =inferno(15),ncol = 4)
p1 /p2

可以看到CD45和CD19这两个基因mRNA和蛋白表达存在较大差异。

################################################################
##### WNN: Define cellular states based on both data types #####
################################################################
brca.cite_wnn <- CreateSeuratObject(counts = brca.cite.rna,meta.data =brca.cite.med )
adt_assay <- CreateAssayObject(counts = brca.cite.adt)
# add this assay to the previously created Seurat object
brca.cite_wnn[["ADT"]] <- adt_assay

#### Pre-processing and dimensional reduction on both assays independently.
DefaultAssay(brca.cite_wnn) <- 'RNA'
brca.cite_wnn <- NormalizeData(brca.cite_wnn) %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA() 

DefaultAssay(brca.cite_wnn) <- 'ADT'
VariableFeatures(brca.cite_wnn) <- rownames(brca.cite_wnn[["ADT"]])
brca.cite_wnn <- NormalizeData(brca.cite_wnn, normalization.method = 'CLR', margin = 2) %>% 
  ScaleData() %>% RunPCA(reduction.name = 'apca')

### Identify multimodal neighbors
brca.cite_wnn <- FindMultiModalNeighbors(
  brca.cite_wnn, reduction.list = list("pca", "apca"),
  dims.list = list(1:30, 1:20), modality.weight.name = "RNA.weight"
)
### Clustering cells
brca.cite_wnn <- RunUMAP(brca.cite_wnn, nn.name = "weighted.nn", reduction.name = "wnn.umap", reduction.key = "wnnUMAP_")
#brca.cite_wnn <- FindClusters(brca.cite_wnn, graph.name = "wsnn", algorithm = 3, resolution = 2, verbose = FALSE)
p1 <- DimPlot(brca.cite_wnn, reduction = 'wnn.umap', group.by = 'orig.ident') + NoLegend()
p2 <- DimPlot(brca.cite_wnn, reduction = 'wnn.umap', group.by = 'celltype_major', cols =cols_major,label = T,label.size = 5,repel = T) #+ NoLegend()
p1 + p2

p5 <- FeaturePlot(brca.cite_wnn, c("adt_CD45",'adt_CD31','adt_CD19',"adt_MCAM"),slot='data', cols = viridis(15),ncol = 4)
p6 <- FeaturePlot(brca.cite_wnn, c("rna_PTPRC",'rna_PECAM1','rna_CD19',"rna_MCAM"),slot='data', cols =inferno(15),ncol = 4)
p5 /p6

###Compareing the UMAP visualization based on only the RNA, protein data and both of them
brca.cite_wnn <- RunUMAP(brca.cite_wnn, reduction = 'pca', dims = 1:30, assay = 'RNA', 
                         reduction.name = 'rna.umap', reduction.key = 'rnaUMAP_')
brca.cite_wnn <- RunUMAP(brca.cite_wnn, reduction = 'apca', dims = 1:20, assay = 'ADT', 
                         reduction.name = 'adt.umap', reduction.key = 'adtUMAP_')

p3 <- DimPlot(brca.cite_wnn, reduction = 'rna.umap', group.by = 'celltype_major', label = F, cols =cols_major)+ ggtitle("RNA") +theme(plot.title = element_text(hjust = 0.5)) + NoLegend()
p4 <- DimPlot(brca.cite_wnn, reduction = 'adt.umap', group.by = 'celltype_major', label = F, cols =cols_major)+ ggtitle("ADT") +theme(plot.title = element_text(hjust = 0.5)) + NoLegend()
p5<- DimPlot(brca.cite_wnn, reduction = 'wnn.umap', group.by = 'celltype_major', label = F, cols =cols_major)+ ggtitle("WNN") +theme(plot.title = element_text(hjust = 0.5),                                                                                                                                  legend.position = "bottom") #+ NoLegend()
p5+p3+p4

再来比较一下在细分时，两种方法的效果：

用经过WNN整合后的数据比仅用mRNA数据识别细胞类型效果要好

Reference：

[1].https://satijalab.org/seurat/articles/multimodal_vignette.html

[2].https://satijalab.org/seurat/articles/weighted_nearest_neighbor_analysis.html

[3].https://www.jianshu.com/p/ceee6aa17335

欢迎关注同名公众号SciNote

【单细胞组学】细胞表面蛋白辅助细胞分群

【单细胞组学】细胞表面蛋白辅助细胞分群

相关阅读更多精彩内容

友情链接更多精彩内容