常用的细胞通讯软件:
- CellphoneDB:是公开的人工校正的,储存受体、配体以及两种相互作用的数据库。此外,还考虑了结构组成,能够描述异构复合物。(配体-受体+多聚体)
- iTALK:通过平均表达量方式,筛选高表达的胚体和受体,根据结果作圈图。(配体-受体)
- CellChat:CellChat将细胞的基因表达数据作为输入,并结合配体受体及其辅助因子的相互作用来模拟细胞间通讯。(配体-受体+多聚体+辅因子)
- NicheNet // NicheNet多样本分析 // 目标基因的配体和靶基因活性预测:通过将相互作用细胞的表达数据与信号和基因调控网络的先验知识相结合来预测相互作用细胞之间的配体-靶标联系的方法。( 配体-受体+信号通路)
附:NicheNet使用的常见问题汇总其它细胞互作软件还包括
Celltalker
,SingleCellSignalR
,scTensor
和SoptSC
(这几个也是基于配体-受体相互作用)
NicheNet 在其分析单细胞转录组数据时,首先构建了细胞内配体、受体、信号蛋白、转录调控因子和靶基因互作信号网络先验知识数据库,然后基于该数据库,通过基因表达模式的强相关性,最终确定了细胞类型之间的潜在通讯,并且能够有效地预测配体活性。
原理简介
NicheNet是一种模拟细胞间相互作用如何影响细胞基因表达的计算方法。真实的细胞通讯过程是配体-受体-信号蛋白-转录调控因子-靶基因, 因此仅仅分析配受体之间的相互作用存在一定的局限性。该方法的创新之处在于通过结合细胞表达数据、信号、基因调控网络的先验知识,来预测相互作用的细胞之间的配体-靶基因的连接情况。目前NicheNet可以处理人或者小鼠数据,以人类或小鼠的基因表达数据作为输入,并将其与通过整合配体到目标信号路径构建的先验模型进行结合。
NicheNet分析中依赖的先验模型,主要是为了表征现有知识对配体可以调节目标基因表达的支持程度,为了计算这种配体-靶标调节潜力, NicheNet的先验模型中不仅包括配体-受体的相互作用信息,还包括了细胞内信号传导信息。因此,NicheNet除了预测哪些配体影响另一个细胞的基因表达外,还可以预测哪些靶基因受到这些配体的影响,以及哪些信号介质可能参与其中。
首先,NicheNet收集涵盖配体-受体(例如,KEGG数据库)、细胞信号转导(例如,蛋白质-蛋白质相互作用和激酶-底物相互作用)、基因调控相互作用(例如,基于ChIP数据和motif的推断)的公共数据信息;然后,将这些独立的数据信息整合到两个加权网络中:(1)配体信号网络,其中包含蛋白质 - 蛋白质相互作用,涵盖从配体到下游转录调节因子的信号通路;(2) 基因调控网络,其中包含转录调控因子和靶基因之间的基因调控相互作用。
为了让信息数据源对最终模型做出更多贡献,该软件在集成过程中对每个数据源进行了加权,这些数据源权重是通过基于模型的参数优化自动确定的,以提高配体-目标预测的准确性。
最后,NicheNet结合配体信号传导和基因调控网络来计算所有配体和靶基因对之间的调控潜力评分:如果靶基因的调节因子位于配体信号网络的下游,则配体-靶标对具有高调节潜力。为了计算这一评分,该方法在整合的网络上使用网络传播方法来传播来自配体的信号,从配体开始,流经受体、信号蛋白、转录调节因子,最终在目标基因处结束。
之前写过NicheNet的标准分析pipeline,实际上做细胞互作分析的时候我们更多的还是在做样本间的互作差异比较。平常我用CellChat比较多,但其实NicheNet也可以做多样本互作比较,而且效果更好。
0. 读入expression data of interest
, NicheNet ligand-receptor network
和 ligand-target matrix
加载所需要的包
library(nichenetr)
library(RColorBrewer)
library(tidyverse)
library(Seurat) #
读入演示数据
seurat_obj = readRDS(url("https://zenodo.org/record/4675430/files/seurat_obj_hnscc.rds"))
p1=DimPlot(seurat_obj, group.by = "celltype") # user adaptation required on own dataset
p2=DimPlot(seurat_obj, group.by = "pEMT") # user adaptation required on own dataset
p1|p2
table(seurat_obj@meta.data$celltype, seurat_obj@meta.data$pEMT)
# High Low
# CAF 396 104
# Endothelial 105 53
# Malignant 1093 549
# Myeloid 92 7
# myofibroblast 382 61
# T.cell 689 3
这个演示里面比较的是pEMT-high-niche和pEMT-low-niche,换成不同组都一样的。
seurat_obj@meta.data$celltype_aggregate = paste(seurat_obj@meta.data$celltype, seurat_obj@meta.data$pEMT,sep = "_") # user adaptation required on own dataset
DimPlot(seurat_obj, group.by = "celltype_aggregate")
seurat_obj@meta.data$celltype_aggregate %>% table() %>% sort(decreasing = TRUE)
## .
## Malignant_High T.cell_High Malignant_Low CAF_High myofibroblast_High Endothelial_High
## 1093 689 549 396 382 105
## CAF_Low Myeloid_High myofibroblast_Low Endothelial_Low Myeloid_Low T.cell_Low
## 104 92 61 53 7 3
celltype_id = "celltype_aggregate" # metadata column name of the cell type of interest
seurat_obj = SetIdent(seurat_obj, value = seurat_obj[[celltype_id]])
读入NicheNet受体靶基因矩阵
(里面的数值可以理解为亲和力)和受体配体网络
(受体配体是否结合)
ligand_target_matrix = readRDS(url("https://zenodo.org/record/3260758/files/ligand_target_matrix.rds"))
ligand_target_matrix[1:5,1:5] # target genes in rows, ligands in columns
## CXCL1 CXCL2 CXCL3 CXCL5 PPBP
## A1BG 3.534343e-04 4.041324e-04 3.729920e-04 3.080640e-04 2.628388e-04
## A1BG-AS1 1.650894e-04 1.509213e-04 1.583594e-04 1.317253e-04 1.231819e-04
## A1CF 5.787175e-04 4.596295e-04 3.895907e-04 3.293275e-04 3.211944e-04
## A2M 6.027058e-04 5.996617e-04 5.164365e-04 4.517236e-04 4.590521e-04
## A2M-AS1 8.898724e-05 8.243341e-05 7.484018e-05 4.912514e-05 5.120439e-05
lr_network = readRDS(url("https://zenodo.org/record/3260758/files/lr_network.rds"))
lr_network = lr_network %>% mutate(bonafide = ! database %in% c("ppi_prediction","ppi_prediction_go"))
lr_network = lr_network %>% dplyr::rename(ligand = from, receptor = to) %>% distinct(ligand, receptor, bonafide)
head(lr_network)
## # A tibble: 6 x 3
## ligand receptor bonafide
## <chr> <chr> <lgl>
## 1 CXCL1 CXCR2 TRUE
## 2 CXCL2 CXCR2 TRUE
## 3 CXCL3 CXCR2 TRUE
## 4 CXCL5 CXCR2 TRUE
## 5 PPBP CXCR2 TRUE
## 6 CXCL6 CXCR2 TRUE
table(lr_network$bonafide)
# FALSE TRUE
# 10629 1390
###?为什么这么多是false?
如果分析的是小鼠的数据,需要先做一下基因的同源转换
organism = "human" # user adaptation required on own dataset
if(organism == "mouse"){
lr_network = lr_network %>% mutate(ligand = convert_human_to_mouse_symbols(ligand), receptor = convert_human_to_mouse_symbols(receptor)) %>% drop_na()
colnames(ligand_target_matrix) = ligand_target_matrix %>% colnames() %>% convert_human_to_mouse_symbols()
rownames(ligand_target_matrix) = ligand_target_matrix %>% rownames() %>% convert_human_to_mouse_symbols()
ligand_target_matrix = ligand_target_matrix %>% .[!is.na(rownames(ligand_target_matrix)), !is.na(colnames(ligand_target_matrix))]
}
1. Define the niches/microenvironments of interest
每个niche应该至少有一个“sender/niche”细胞群和一个“receiver/target”细胞群。
在这个演示数据集中,我们想要去查看pEMT high和pEMT low的肿瘤组织中免疫细胞对肿瘤细胞的作用差异。因此“Malignant_High”和“Malignant_Low”被定义为“receiver/target”细胞群,其它细胞被定义为“sender/niche”细胞群。注意:T.Cell和Myeloid细胞只有在pEMT-High样本中才被定义为sender,因为pEMT-low样本中这两类细胞数目太少了。
⚠️也就是说,NicheNet在做组间比较的时候,可以把condition-specific的细胞群考虑在内。(比较的是所有sender细胞的组间差异,而不是细胞特异性组间差异)
! Important: your receiver cell type should consist of 1 cluster!
(可以理解为:解析组织微环境如何决定某种细胞的行为)
niches = list(
"pEMT_High_niche" = list(
"sender" = c("myofibroblast_High", "Endothelial_High", "CAF_High", "T.cell_High", "Myeloid_High"),
"receiver" = c("Malignant_High")),
"pEMT_Low_niche" = list(
"sender" = c("myofibroblast_Low", "Endothelial_Low", "CAF_Low"),
"receiver" = c("Malignant_Low"))
) # user adaptation required on own dataset
2. Calculate differential expression between the niches (得到的是差异性受体配体矩阵
)
In this step, we will determine DE between the different niches for both senders and receivers to define the DE of L-R pairs.
2.1 计算DE
calculate_niche_de
Calculate differential expression of cell types in one niche versus all other niches of interest. This is possible for sender cell types and receiver cell types.
计算差异基因的方法默认是Seurat Wilcoxon test(也可以使用其它方法)。
assay_oi = "SCT" # other possibilities: RNA,...
DE_sender = calculate_niche_de(seurat_obj = seurat_obj %>% subset(features = lr_network$ligand %>% unique()), niches = niches, type = "sender", assay_oi = assay_oi) # only ligands important for sender cell types
## [1] "Calculate Sender DE between: myofibroblast_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: myofibroblast_High and Endothelial_Low"
## [3] "Calculate Sender DE between: myofibroblast_High and CAF_Low"
## [1] "Calculate Sender DE between: Endothelial_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: Endothelial_High and Endothelial_Low"
## [3] "Calculate Sender DE between: Endothelial_High and CAF_Low"
## [1] "Calculate Sender DE between: CAF_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: CAF_High and Endothelial_Low"
## [3] "Calculate Sender DE between: CAF_High and CAF_Low"
## [1] "Calculate Sender DE between: T.cell_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: T.cell_High and Endothelial_Low"
## [3] "Calculate Sender DE between: T.cell_High and CAF_Low"
## [1] "Calculate Sender DE between: Myeloid_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: Myeloid_High and Endothelial_Low"
## [3] "Calculate Sender DE between: Myeloid_High and CAF_Low"
## [1] "Calculate Sender DE between: myofibroblast_Low and myofibroblast_High"
## [2] "Calculate Sender DE between: myofibroblast_Low and Endothelial_High"
## [3] "Calculate Sender DE between: myofibroblast_Low and CAF_High"
## [4] "Calculate Sender DE between: myofibroblast_Low and T.cell_High"
## [5] "Calculate Sender DE between: myofibroblast_Low and Myeloid_High"
## [1] "Calculate Sender DE between: Endothelial_Low and myofibroblast_High"
## [2] "Calculate Sender DE between: Endothelial_Low and Endothelial_High"
## [3] "Calculate Sender DE between: Endothelial_Low and CAF_High"
## [4] "Calculate Sender DE between: Endothelial_Low and T.cell_High"
## [5] "Calculate Sender DE between: Endothelial_Low and Myeloid_High"
## [1] "Calculate Sender DE between: CAF_Low and myofibroblast_High"
## [2] "Calculate Sender DE between: CAF_Low and Endothelial_High"
## [3] "Calculate Sender DE between: CAF_Low and CAF_High"
## [4] "Calculate Sender DE between: CAF_Low and T.cell_High"
## [5] "Calculate Sender DE between: CAF_Low and Myeloid_High"
DE_receiver = calculate_niche_de(seurat_obj = seurat_obj %>% subset(features = lr_network$receptor %>% unique()), niches = niches, type = "receiver", assay_oi = assay_oi) # only receptors now, later on: DE analysis to find targets
## # A tibble: 1 x 2
## receiver receiver_other_niche
## <chr> <chr>
## 1 Malignant_High Malignant_Low
## [1] "Calculate receiver DE between: Malignant_High and Malignant_Low"
## [1] "Calculate receiver DE between: Malignant_Low and Malignant_High"
可以看到,它是先计算了Sender:high的5种sender细胞分别和low的3中sender细胞的Sender DE,又反过来计算了low的3中sender细胞分别和high的5种sender细胞的DE。
然后计算了Receiver:肿瘤细胞high-low的差异基因和low-high的差异基因。
这样把细胞类型分开挨个计算而不是把所有sender和receiver细胞合并计算的意义是避免差异分析的结果主要被丰度高的细胞驱动。
正着计算一遍,反着计算一遍,a-->b上调的/下调的基因和b-->a下调的/上调的难道不是一样的吗?
随后我去验证了一下,果然如此。以receiver中的IL6为例
DE_sender = DE_sender %>% mutate(avg_log2FC = ifelse(avg_log2FC == Inf, max(avg_log2FC[is.finite(avg_log2FC)]), ifelse(avg_log2FC == -Inf, min(avg_log2FC[is.finite(avg_log2FC)]), avg_log2FC)))
DE_receiver = DE_receiver %>% mutate(avg_log2FC = ifelse(avg_log2FC == Inf, max(avg_log2FC[is.finite(avg_log2FC)]), ifelse(avg_log2FC == -Inf, min(avg_log2FC[is.finite(avg_log2FC)]), avg_log2FC)))
2.2 处理DE结果
expression_pct = 0.10
DE_sender_processed = process_niche_de(DE_table = DE_sender, niches = niches, expression_pct = expression_pct, type = "sender")
DE_receiver_processed = process_niche_de(DE_table = DE_receiver, niches = niches, expression_pct = expression_pct, type = "receiver")
以receiver中的IL6为例
process_niche_de
是对calculate_niche_de的结果做了上上图中所标的整合(红色的整合为了五种pEMT_High_niche,下面三种颜色分别整合为三种pEMT_Low_niche)
2.3 Combine sender-receiver DE based on L-R pairs:
如前所述,来自一种sender细胞的差异表达的配体是通过计算该样品这种sender细胞和另一样品中所有sender细胞得到的。因此我们有多种方法总结得到细胞类型的差异表达配体。我们可以使用average LFC,也可以使用minimum LFC。但是更推荐使用minimum LFC
。因为它是评估配体表达的最强的特异性指标,因为高的min LFC意味着和niche 2中的所有细胞类型相比,这个配体在niche 1的这个细胞类型中表达更强(如果使用average LFC,则不能排除niche 2中一种或多种细胞也很强的表达这个配体)。
specificity_score_LR_pairs = "min_lfc"
DE_sender_receiver = combine_sender_receiver_de(DE_sender_processed, DE_receiver_processed, lr_network, specificity_score = specificity_score_LR_pairs)
table(DE_sender_receiver$sender)
# CAF_High CAF_Low Endothelial_High Endothelial_Low
# 8171 8171 8171 8171
# Myeloid_High myofibroblast_High myofibroblast_Low T.cell_High
# 8171 8171 8171 8171
table(DE_sender_receiver$receiver)
# Malignant_High Malignant_Low
# 40855 24513
View(DE_sender_receiver)
这一步主要得到了DE_sender_receiver
这个对象,也就是不同niche中的差异基因。
3. 计算空间互作差异(可选)
如果有空间转录组数据,把下面两行代码设置成TRUE,没有的话直接运行下面的代码即可。
include_spatial_info_sender = FALSE # if not spatial info to include: put this to false # user adaptation required on own dataset
include_spatial_info_receiver = FALSE # if spatial info to include: put this to true # user adaptation required on own dataset
spatial_info = tibble(celltype_region_oi = "CAF_High", celltype_other_region = "myofibroblast_High", niche = "pEMT_High_niche", celltype_type = "sender") # user adaptation required on own dataset
specificity_score_spatial = "lfc"
# this is how this should be defined if you don't have spatial info
# mock spatial info
if(include_spatial_info_sender == FALSE & include_spatial_info_receiver == FALSE){
spatial_info = tibble(celltype_region_oi = NA, celltype_other_region = NA) %>% mutate(niche = niches %>% names() %>% head(1), celltype_type = "sender")
}
if(include_spatial_info_sender == TRUE){
sender_spatial_DE = calculate_spatial_DE(seurat_obj = seurat_obj %>% subset(features = lr_network$ligand %>% unique()), spatial_info = spatial_info %>% filter(celltype_type == "sender"))
sender_spatial_DE_processed = process_spatial_de(DE_table = sender_spatial_DE, type = "sender", lr_network = lr_network, expression_pct = expression_pct, specificity_score = specificity_score_spatial)
# add a neutral spatial score for sender celltypes in which the spatial is not known / not of importance
sender_spatial_DE_others = get_non_spatial_de(niches = niches, spatial_info = spatial_info, type = "sender", lr_network = lr_network)
sender_spatial_DE_processed = sender_spatial_DE_processed %>% bind_rows(sender_spatial_DE_others)
sender_spatial_DE_processed = sender_spatial_DE_processed %>% mutate(scaled_ligand_score_spatial = scale_quantile_adapted(ligand_score_spatial))
} else {
# # add a neutral spatial score for all sender celltypes (for none of them, spatial is relevant in this case)
sender_spatial_DE_processed = get_non_spatial_de(niches = niches, spatial_info = spatial_info, type = "sender", lr_network = lr_network)
sender_spatial_DE_processed = sender_spatial_DE_processed %>% mutate(scaled_ligand_score_spatial = scale_quantile_adapted(ligand_score_spatial))
}
## [1] "Calculate Spatial DE between: CAF_High and myofibroblast_High"
if(include_spatial_info_receiver == TRUE){
receiver_spatial_DE = calculate_spatial_DE(seurat_obj = seurat_obj %>% subset(features = lr_network$receptor %>% unique()), spatial_info = spatial_info %>% filter(celltype_type == "receiver"))
receiver_spatial_DE_processed = process_spatial_de(DE_table = receiver_spatial_DE, type = "receiver", lr_network = lr_network, expression_pct = expression_pct, specificity_score = specificity_score_spatial)
# add a neutral spatial score for receiver celltypes in which the spatial is not known / not of importance
receiver_spatial_DE_others = get_non_spatial_de(niches = niches, spatial_info = spatial_info, type = "receiver", lr_network = lr_network)
receiver_spatial_DE_processed = receiver_spatial_DE_processed %>% bind_rows(receiver_spatial_DE_others)
receiver_spatial_DE_processed = receiver_spatial_DE_processed %>% mutate(scaled_receptor_score_spatial = scale_quantile_adapted(receptor_score_spatial))
} else {
# # add a neutral spatial score for all receiver celltypes (for none of them, spatial is relevant in this case)
receiver_spatial_DE_processed = get_non_spatial_de(niches = niches, spatial_info = spatial_info, type = "receiver", lr_network = lr_network)
receiver_spatial_DE_processed = receiver_spatial_DE_processed %>% mutate(scaled_receptor_score_spatial = scale_quantile_adapted(receptor_score_spatial))
}
4. 计算配体活性,推断active ligand-target links
在这一步中,我们将要预测不同niches中receiver细胞类型的每个配体的活性。(和常规NicheNet的配体活性分析类似)。
⚠️第2步是
受体 -- 配体
,这一步是配体--靶基因
。
为了计算配体活性,我们首先需要在每个niche中分别定义一个感兴趣的基因集。在这个示例中,pEMT-high的基因集是和pEMT-low肿瘤相比,pEMT-high中的上调基因。pEMT-low的基因集则相反。
lfc_cutoff = 0.15 # recommended for 10x as min_lfc cutoff.
specificity_score_targets = "min_lfc"
DE_receiver_targets = calculate_niche_de_targets(seurat_obj = seurat_obj, niches = niches, lfc_cutoff = lfc_cutoff, expression_pct = expression_pct, assay_oi = assay_oi)
## [1] "Calculate receiver DE between: Malignant_High and Malignant_Low"
## [1] "Calculate receiver DE between: Malignant_Low and Malignant_High"
DE_receiver_processed_targets = process_receiver_target_de(DE_receiver_targets = DE_receiver_targets, niches = niches, expression_pct = expression_pct, specificity_score = specificity_score_targets)
View(DE_receiver_processed_targets)
background = DE_receiver_processed_targets %>% pull(target) %>% unique()
geneset_niche1 = DE_receiver_processed_targets %>% filter(receiver == niches[[1]]$receiver & target_score >= lfc_cutoff & target_significant == 1 & target_present == 1) %>% pull(target) %>% unique()
geneset_niche2 = DE_receiver_processed_targets %>% filter(receiver == niches[[2]]$receiver & target_score >= lfc_cutoff & target_significant == 1 & target_present == 1) %>% pull(target) %>% unique()
# Good idea to check which genes will be left out of the ligand activity analysis (=when not present in the rownames of the ligand-target matrix).
# If many genes are left out, this might point to some issue in the gene naming (eg gene aliases and old gene symbols, bad human-mouse mapping)
geneset_niche1 %>% setdiff(rownames(ligand_target_matrix))
geneset_niche2 %>% setdiff(rownames(ligand_target_matrix))
length(geneset_niche1) #niche1中受体的target
## [1] 1668
length(geneset_niche2) #niche2中受体的target
## [1] 2889
在做配体活性分析之前,最好还是做一下基因集中的基因数的检测。一般认为对配体活性分析来说,感兴趣的基因集中有20-1000基因是比较合适的。如果得到的DE基因数过多,推荐使用更高的lfc_cutoff
阈值。在有>2的receiver细胞/niches时或,我们建议使用0.15的cutoff值。如果只有2组receiver细胞/niches时,我们建议使用更高的阈值(比如0.25)。如果是测序深度比较深的数据比如Smart-seq2,同样建议使用更高的阈值。
在这个演示数据中,我们使用的是Smart-seq2的数据,而且只有比较了2个niches,所以我们使用高LFC阈值以得到更少的DE基因(更高的阈值得到的DE基因 更少,可信度更高)。
lfc_cutoff = 0.75
specificity_score_targets = "min_lfc"
DE_receiver_targets = calculate_niche_de_targets(seurat_obj = seurat_obj, niches = niches, lfc_cutoff = lfc_cutoff, expression_pct = expression_pct, assay_oi = assay_oi)
DE_receiver_processed_targets = process_receiver_target_de(DE_receiver_targets = DE_receiver_targets, niches = niches, expression_pct = expression_pct, specificity_score = specificity_score_targets)
background = DE_receiver_processed_targets %>% pull(target) %>% unique()
geneset_niche1 = DE_receiver_processed_targets %>% filter(receiver == niches[[1]]$receiver & target_score >= lfc_cutoff & target_significant == 1 & target_present == 1) %>% pull(target) %>% unique()
geneset_niche2 = DE_receiver_processed_targets %>% filter(receiver == niches[[2]]$receiver & target_score >= lfc_cutoff & target_significant == 1 & target_present == 1) %>% pull(target) %>% unique()
# Good idea to check which genes will be left out of the ligand activity analysis (=when not present in the rownames of the ligand-target matrix).
# If many genes are left out, this might point to some issue in the gene naming (eg gene aliases and old gene symbols, bad human-mouse mapping)
geneset_niche1 %>% setdiff(rownames(ligand_target_matrix))
## [1] "ANXA8L2" "PRKCDBP" "IL8" "PTRF" "SEPP1" "C1orf186"
geneset_niche2 %>% setdiff(rownames(ligand_target_matrix))
## [1] "LOC344887" "AGPAT9" "C1orf110" "KIAA1467" "LOC100292680" "EPT1" "CT45A4"
length(geneset_niche1)
## [1] 169
length(geneset_niche2)
## [1] 136
top_n_target = 250
niche_geneset_list = list(
"pEMT_High_niche" = list(
"receiver" = niches[[1]]$receiver,
"geneset" = geneset_niche1,
"background" = background),
"pEMT_Low_niche" = list(
"receiver" = niches[[2]]$receiver,
"geneset" = geneset_niche2 ,
"background" = background)
)
ligand_activities_targets = get_ligand_activities_targets(niche_geneset_list = niche_geneset_list, ligand_target_matrix = ligand_target_matrix, top_n_target = top_n_target)
## [1] "Calculate Ligand activities for: Malignant_High"
## [1] "Calculate Ligand activities for: Malignant_Low"
上面这一步经常出现这个warning
5. 计算不同感兴趣细胞群中的受体、配体和靶基因的(scaled)表达 (log expression values and expression fractions)
在这一步中,我们将会计算受体、配体和靶基因在不同细胞群中的平均(scaled)表达和表达fraction。这里是使用DotPlot展示的,也可以用其他方式展示。
features_oi = union(lr_network$ligand, lr_network$receptor) %>% union(ligand_activities_targets$target) %>% setdiff(NA)
# 上一步得到1597个feature
dotplot = suppressWarnings(Seurat::DotPlot(seurat_obj %>% subset(idents = niches %>% unlist() %>% unique()), features = features_oi, assay = assay_oi))
exprs_tbl = dotplot$data %>% as_tibble()
exprs_tbl = exprs_tbl %>% rename(celltype = id, gene = features.plot, expression = avg.exp, expression_scaled = avg.exp.scaled, fraction = pct.exp) %>%
mutate(fraction = fraction/100) %>% as_tibble() %>% select(celltype, gene, expression, expression_scaled, fraction) %>% distinct() %>% arrange(gene) %>% mutate(gene = as.character(gene))
exprs_tbl_ligand = exprs_tbl %>% filter(gene %in% lr_network$ligand) %>% rename(sender = celltype, ligand = gene, ligand_expression = expression, ligand_expression_scaled = expression_scaled, ligand_fraction = fraction)
exprs_tbl_receptor = exprs_tbl %>% filter(gene %in% lr_network$receptor) %>% rename(receiver = celltype, receptor = gene, receptor_expression = expression, receptor_expression_scaled = expression_scaled, receptor_fraction = fraction)
exprs_tbl_target = exprs_tbl %>% filter(gene %in% ligand_activities_targets$target) %>% rename(receiver = celltype, target = gene, target_expression = expression, target_expression_scaled = expression_scaled, target_fraction = fraction)
dotplot
exprs_tbl_ligand = exprs_tbl_ligand %>% mutate(scaled_ligand_expression_scaled = scale_quantile_adapted(ligand_expression_scaled)) %>% mutate(ligand_fraction_adapted = ligand_fraction) %>% mutate_cond(ligand_fraction >= expression_pct, ligand_fraction_adapted = expression_pct) %>% mutate(scaled_ligand_fraction_adapted = scale_quantile_adapted(ligand_fraction_adapted))
exprs_tbl_receptor = exprs_tbl_receptor %>% mutate(scaled_receptor_expression_scaled = scale_quantile_adapted(receptor_expression_scaled)) %>% mutate(receptor_fraction_adapted = receptor_fraction) %>% mutate_cond(receptor_fraction >= expression_pct, receptor_fraction_adapted = expression_pct) %>% mutate(scaled_receptor_fraction_adapted = scale_quantile_adapted(receptor_fraction_adapted))
这一步得到了ligand, receptor和target的表达表
。以exprs_tbl_ligand为例,每个表中都有ligand/ receptor/ target的细胞类型,表达量和在细胞中的表达百分比。
6. Expression fraction and receptor
在这一步中,我们将会基于受体表达强度计算配体-受体互作,基于此,我们将对各细胞类型里每个配体的受体进行打分,表达最强的受体将被给予最高的评分。这不会影响随后对单个配体的排序,但是将会帮助我们对每个配体最重要的受体进行排序。(next to other factors regarding the receptor - see later).
exprs_sender_receiver = lr_network %>%
inner_join(exprs_tbl_ligand, by = c("ligand")) %>%
inner_join(exprs_tbl_receptor, by = c("receptor")) %>% inner_join(DE_sender_receiver %>% distinct(niche, sender, receiver))
ligand_scaled_receptor_expression_fraction_df = exprs_sender_receiver %>% group_by(ligand, receiver) %>% mutate(rank_receptor_expression = dense_rank(receptor_expression), rank_receptor_fraction = dense_rank(receptor_fraction)) %>% mutate(ligand_scaled_receptor_expression_fraction = 0.5*( (rank_receptor_fraction / max(rank_receptor_fraction)) + ((rank_receptor_expression / max(rank_receptor_expression))) ) ) %>% distinct(ligand, receptor, receiver, ligand_scaled_receptor_expression_fraction, bonafide) %>% distinct() %>% ungroup()
View(ligand_scaled_receptor_expression_fraction_df)
7. Prioritization of ligand-receptor and ligand-target links
在这一步中,我们将会结合上面所有的计算结果来对ligand-receptor-target之间的links进行排序。
We scale every property of interest between 0 and 1, and the final prioritization score is a weighted sum of the scaled scores of all the properties of interest.
prioritizing_weights
由以下代码中的12个参数综合决定。
prioritizing_weights = c("scaled_ligand_score" = 5,
"scaled_ligand_expression_scaled" = 1,
"ligand_fraction" = 1,
"scaled_ligand_score_spatial" = 2,
"scaled_receptor_score" = 0.5,
"scaled_receptor_expression_scaled" = 0.5,
"receptor_fraction" = 1,
"ligand_scaled_receptor_expression_fraction" = 1,
"scaled_receptor_score_spatial" = 0,
"scaled_activity" = 0,
"scaled_activity_normalized" = 1,
"bona_fide" = 1)
上面的参数是可变的,需要根据不同的需求设置。(每个参数的含义见:Differential NicheNet analysis between conditions of interest)
output = list(DE_sender_receiver = DE_sender_receiver, ligand_scaled_receptor_expression_fraction_df = ligand_scaled_receptor_expression_fraction_df, sender_spatial_DE_processed = sender_spatial_DE_processed, receiver_spatial_DE_processed = receiver_spatial_DE_processed,
ligand_activities_targets = ligand_activities_targets, DE_receiver_processed_targets = DE_receiver_processed_targets, exprs_tbl_ligand = exprs_tbl_ligand, exprs_tbl_receptor = exprs_tbl_receptor, exprs_tbl_target = exprs_tbl_target)
prioritization_tables = get_prioritization_tables(output, prioritizing_weights)
##查看list基本结构
prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(receiver == niches[[1]]$receiver) %>% head(10)
prioritization_tables$prioritization_tbl_ligand_target %>% filter(receiver == niches[[1]]$receiver) %>% head(10)
prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(receiver == niches[[2]]$receiver) %>% head(10)
prioritization_tables$prioritization_tbl_ligand_target %>% filter(receiver == niches[[2]]$receiver) %>% head(10)
8. 可视化
8.1 Differential expression of ligand and expression
在可视化之前,我们需要先定义每个niche中最重要的配体受体对。We will do this by first determining for which niche the highest score is found for each ligand/ligand-receptor pair. 然后我们就可以得到每个niche中的top 50 ligands。
top_ligand_niche_df = prioritization_tables$prioritization_tbl_ligand_receptor %>% select(niche, sender, receiver, ligand, receptor, prioritization_score) %>% group_by(ligand) %>% top_n(1, prioritization_score) %>% ungroup() %>% select(ligand, receptor, niche) %>% rename(top_niche = niche)
top_ligand_receptor_niche_df = prioritization_tables$prioritization_tbl_ligand_receptor %>% select(niche, sender, receiver, ligand, receptor, prioritization_score) %>% group_by(ligand, receptor) %>% top_n(1, prioritization_score) %>% ungroup() %>% select(ligand, receptor, niche) %>% rename(top_niche = niche)
ligand_prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% select(niche, sender, receiver, ligand, prioritization_score) %>% group_by(ligand, niche) %>% top_n(1, prioritization_score) %>% ungroup() %>% distinct() %>% inner_join(top_ligand_niche_df) %>% filter(niche == top_niche) %>% group_by(niche) %>% top_n(50, prioritization_score) %>% ungroup() # get the top50 ligands per niche
table(ligand_prioritized_tbl_oi$receiver)
# Malignant_High Malignant_Low
# 50 50
View(ligand_prioritized_tbl_oi)
然后就是可视化啦!首先我们展示top 受体-配体对,这里展示了每个配体的评分最高的top 2受体。(可以通过修改下面top_n后的数字修改)
receiver_oi = "Malignant_High"
filtered_ligands = ligand_prioritized_tbl_oi %>% filter(receiver == receiver_oi) %>% pull(ligand) %>% unique()
prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(ligand %in% filtered_ligands) %>% select(niche, sender, receiver, ligand, receptor, ligand_receptor, prioritization_score) %>% distinct() %>% inner_join(top_ligand_receptor_niche_df) %>% group_by(ligand) %>% filter(receiver == receiver_oi) %>% top_n(2, prioritization_score) %>% ungroup()
Visualization: minimum LFC compared to other niches
lfc_plot = make_ligand_receptor_lfc_plot(receiver_oi, prioritized_tbl_oi, prioritization_tables$prioritization_tbl_ligand_receptor, plot_legend = FALSE, heights = NULL, widths = NULL)
lfc_plot
Show the spatialDE as additional information(适用于空间数据)
lfc_plot_spatial = make_ligand_receptor_lfc_spatial_plot(receiver_oi, prioritized_tbl_oi, prioritization_tables$prioritization_tbl_ligand_receptor, ligand_spatial = include_spatial_info_sender, receptor_spatial = include_spatial_info_receiver, plot_legend = FALSE, heights = NULL, widths = NULL)
lfc_plot_spatial
8.2 配体表达,活性和靶基因
visualization of ligand activity and ligand-target links
exprs_activity_target_plot = make_ligand_activity_target_exprs_plot(receiver_oi, prioritized_tbl_oi, prioritization_tables$prioritization_tbl_ligand_receptor, prioritization_tables$prioritization_tbl_ligand_target, output$exprs_tbl_ligand, output$exprs_tbl_target, lfc_cutoff, ligand_target_matrix, plot_legend = FALSE, heights = NULL, widths = NULL)
exprs_activity_target_plot$combined_plot
基于这个plot,我们可以推断出许多假说假说,比如:“Interestingly, IL1 family ligands seem to have activity in inducing the DE genes between high pEMT and low pEMT malignant cells; and they are mainly expressed by myeloid cells, a cell type unique for pEMT-high tumors.”
important: ligand-receptor pairs with both high differential expression (or condition-specificity) and ligand activity (=target gene enrichment) are very interesting predictions as key regulators of your intercellular communication process of interest !
这个图默认展示top50的配体。如果觉得展示的信息太多,不容易从其中找到有价值的线索,也可以只展示top20。
filtered_ligands = ligand_prioritized_tbl_oi %>% filter(receiver == receiver_oi) %>% top_n(20, prioritization_score) %>% pull(ligand) %>% unique()
prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(ligand %in% filtered_ligands) %>% select(niche, sender, receiver, ligand, receptor, ligand_receptor, prioritization_score) %>% distinct() %>% inner_join(top_ligand_receptor_niche_df) %>% group_by(ligand) %>% filter(receiver == receiver_oi) %>% top_n(2, prioritization_score) %>% ungroup()
exprs_activity_target_plot = make_ligand_activity_target_exprs_plot(receiver_oi, prioritized_tbl_oi, prioritization_tables$prioritization_tbl_ligand_receptor, prioritization_tables$prioritization_tbl_ligand_target, output$exprs_tbl_ligand, output$exprs_tbl_target, lfc_cutoff, ligand_target_matrix, plot_legend = FALSE, heights = NULL, widths = NULL)
exprs_activity_target_plot$combined_plot
8.3 Circos plot of prioritized ligand-receptor pairs
Because a top50 is too much to visualize in a circos plot, we will only visualize the top 15.
filtered_ligands = ligand_prioritized_tbl_oi %>% filter(receiver == receiver_oi) %>% top_n(15, prioritization_score) %>% pull(ligand) %>% unique()
prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(ligand %in% filtered_ligands) %>% select(niche, sender, receiver, ligand, receptor, ligand_receptor, prioritization_score) %>% distinct() %>% inner_join(top_ligand_receptor_niche_df) %>% group_by(ligand) %>% filter(receiver == receiver_oi) %>% top_n(2, prioritization_score) %>% ungroup()
colors_sender = brewer.pal(n = prioritized_tbl_oi$sender %>% unique() %>% sort() %>% length(), name = 'Spectral') %>% magrittr::set_names(prioritized_tbl_oi$sender %>% unique() %>% sort())
colors_receiver = c("lavender") %>% magrittr::set_names(prioritized_tbl_oi$receiver %>% unique() %>% sort())
circos_output = make_circos_lr(prioritized_tbl_oi, colors_sender, colors_receiver)
8.4 Interpretation of these results
多数排名靠前的差异性L-R对似乎来自仅存在于pEMT high肿瘤中的细胞类型。这可能部分是由于生物学的因素(在某种情况下某种独特的细胞类型可能非常重要),但也可能是由于优先排序的方式以及这些独特的细胞类型在其他niche中并没有 “counterpart”。
由于肿瘤微环境中,髓系细胞和T细胞与其他细胞有很大不同,因此它们的配体会显示出很强的差异表达。与来自相同细胞类型但不同niche/条件的细胞之间的差异表达(pEMT-high中的CAF与pEMT-low中的CAF相比)相比,同一niche/条件中(如pEMT-low肿瘤中)髓系/T 细胞vs肌成纤维细胞/CAFs/内皮细胞的差异表达可能更明显(这时候需要做同一niche不同细胞类型的互作差异)。
8.5 Visualization for the other condition: pEMT-low
前面可视化的是在pEMT-high niche中上调的受体配体对,我们也可以可视化在pEMT-low niche中上调的受体配体对。
receiver_oi = "Malignant_Low"
filtered_ligands = ligand_prioritized_tbl_oi %>% filter(receiver == receiver_oi) %>% top_n(50, prioritization_score) %>% pull(ligand) %>% unique()
prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(ligand %in% filtered_ligands) %>% select(niche, sender, receiver, ligand, receptor, ligand_receptor, prioritization_score) %>% distinct() %>% inner_join(top_ligand_receptor_niche_df) %>% group_by(ligand) %>% filter(receiver == receiver_oi) %>% top_n(2, prioritization_score) %>% ungroup()
lfc_plot = make_ligand_receptor_lfc_plot(receiver_oi, prioritized_tbl_oi, prioritization_tables$prioritization_tbl_ligand_receptor, plot_legend = FALSE, heights = NULL, widths = NULL)
lfc_plot
9. Notes, limitations, and comparison to default NicheNet.
在原始的NicheNet pipeline中,配体 - 受体对表达的排序仅仅是根据其配体活性得出的。在这个差异性NicheNet pipeline中,我们也是根据与其他niches(或者其他空间位置信息[空转数据])相比,L-R 对的差异表达来得到信息。
因为我们在这里关注配体- 受体对的差异表达,并且通过默认将DE而不是activity赋予更高的优先级权重,我们倾向于找到许多与默认NicheNet pipeline不同的hits。在Differential NicheNet中,我们倾向于找到更多的high-DE, low-activity hits,而使用default NicheNet,我们找到更多的low-DE, high-activity hits。
应该注意的是,一些high-DE, low-activity hits可能非常重要,因为它们可能是由于NicheNet活性预测的限制而具有低NicheNet活性(例如NicheNet中关于该配体的不正确/不完全的先验知识),但其中一些也可能在DE中很高,但activity不高,因为它们没有强烈的信号效应(例如,仅参与细胞粘附的配体)。
相反的,对于在Diffifer NicheNet中没有被强烈优先考虑的low-DE, high-activity的受体配体对,应考虑以下因素:1)一些配体受到转录后调控,高预测活性可能仍然反映的是真实的信号传导; 2)高预测活性值可能是由于NicheNet的局限性(不准确的先验知识),这些低DE配体在感兴趣的生物学过程中并不重要(但该配体的高DE家族成员可能是重要的。因为家族成员之间的信号传导往往非常相似); 3)一种情况下的高活性可能是由于另一种情况下的下调,导致高activity和DE低。目前,配体活性是在每个条件下根据上调基因计算的,但下调基因也可能是配体活性的标志。
当配体 - 受体对同时具有高DE和高activity时,我们可以认为它们是调节感兴趣过程的非常好的候选者,我们建议测试这些候选物以进行进一步的实验验证。
应用文献
References
Browaeys, R., Saelens, W. & Saeys, Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods (2019) doi:10.1038/s41592-019-0667-5
Guilliams et al. Spatial proteogenomics reveals distinct and evolutionarily conserved hepatic macrophage niches. Cell (2022) doi:10.1016/j.cell.2021.12.018
Puram, Sidharth V., Itay Tirosh, Anuraag S. Parikh, Anoop P. Patel, Keren Yizhak, Shawn Gillespie, Christopher Rodman, et al. 2017. “Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer.” Cell 171 (7): 1611–1624.e24. https://doi.org/10.1016/j.cell.2017.10.044.