今天再学习一下第三个适用于单细胞转录组数据的细胞通讯工具:NicheNet,与其他常见细胞通讯分析软件(CellphoneDB,cellchat等工具)相比而言,NicheNet的创新之处在于通过结合细胞表达数据、信号、基因调控网络的先验知识,来预测相互作用的细胞之间的配体-靶基因的连接情况(并且真实的细胞通讯过程是配体-受体-信号蛋白-转录调控因子-靶基因, 因此仅仅分析配受体之间的相互作用存在一定的局限性)。目前NicheNet可以处理人或者小鼠数据,以人类或小鼠的基因表达数据作为输入,并将其与通过整合配体到目标信号路径构建的先验模型进行结合。
什么情况下比较适用NichNet方法呢?
NicheNet 可以用于研究配体如何影响假定相邻或者相互作用的细胞群中的基因表达情况,并对得到的重要的配体进行优先排序。需要提供不同条件或者状态下,接受信号的细胞群(receiver cells)中的差异基因列表,通常认为这些基因的差异表达是发送信号细胞群中的配体导致的 。因此,可以是同一类型细胞在不同条件下的差异表达基因,也可以是两种细胞类型之间的差异表达基因,例如,如果它们是祖细胞和分化细胞类型,其中细胞分化会受微环境的影响。
如果只对稳态条件下的配体 - 受体相互作用感兴趣(并且只有稳态条件数据),可以采用CellphoneDB 等其他一些工具来实现, 不建议将NicheNet应用于稳态数据,主要是由于不存在一组明确地受细胞间通讯过程影响的基因,从而可能导致错误地将配体与细胞内的一些基因进行关联。
NicheNet: modeling intercellular communication by linking ligands to target genes | Nature Methods
———————————————————————————————
Major updates (20-06-2023)!
- MultiNicheNet - a multi-sample, multi-condition extension of NicheNet - is now available on biorxiv and Github.
- MultiNicheNet uses an updated prior model (v2) consisting of additional ligand-receptor interactions from the Omnipath database and from Verschueren et al. (2020). We have now also updated the vignettes of NicheNet to use the new model instead.
- New functionality: we have included additional functions to prioritize ligands not only based on the ligand activity, but also on the ligand and receptor expression, cell type specificity, and condition specificity. This is similar to the criteria used in Differential NicheNet and MultiNicheNet. See the Prioritizing ligands based on expression values vignette for more information.
- Due to this more generalizable prioritization scheme, we will no longer provide support for Differential NicheNet.
- We included code for making a ligand-receptor-target circos plot in the Circos plot visualization vignette.
————————————————————————————————————————————————————
下面开始今天的学习吧!
一、安装
# install.packages("devtools")
devtools::install_github("saeyslab/nichenetr")
二、加载所需要的R包、数据(在Seurat对象中读取经过处理的相互作用细胞的表达数据)
library(nichenetr)
library(Seurat) # please update to Seurat V4
library(tidyverse)
seuratObj <- readRDS('./seuratObj.rds')
seuratObj@meta.data %>% head()
seuratObj@meta.data$celltype %>% table() # note that the number of cells of some cell types is very low and should preferably be higher for a real application
#B cells CD4+ pro.T CD4+ T CD8+ CTL CD8+ T Erythrocytes Myeloid cells
#1021 288 6222 2406 370 79 4004
#NK cells NK T Platelets unknown Unknown Yδ T
#796 1806 136 47 290 874
DimPlot(seuratObj, reduction = "umap",group.by = 'celltype')
seuratObj@meta.data$Identity %>% table()
#ACs ATL
#9842 8497
DimPlot(seuratObj, reduction = "umap", group.by = "Identity")
三、加载NicheNet的配体-靶标先验模型、配体-受体网络和加权集成网络。
#organism = "human"
#lr_network = readRDS(url("https://zenodo.org/record/7074291/files/lr_network_human_21122021.rds"))
#ligand_target_matrix=readRDS(url("https://zenodo.org/record/7074291/files/ligand_target_matrix_nsga2r_final.rds"))
#weighted_networks = readRDS(url("https://zenodo.org/record/7074291/files/weighted_networks_nsga2r_final.rds"))
ligand_target_matrix <- readRDS('./NicheNet/ligand_target_matrix_nsga2r_final.rds')
ligand_target_matrix[1:5,1:5] # target genes in rows, ligands in columns
# A2M AANAT ABCA1 ACE ACE2
#A-GAMMA3'E 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.000000000
#A1BG 0.0018503922 0.0011108718 0.0014225077 0.0028594037 0.001139013
#A1BG-AS1 0.0007400797 0.0004677614 0.0005193137 0.0007836698 0.000375007
#A1CF 0.0024799266 0.0013026348 0.0020420890 0.0047921048 0.003273375
#A2M 0.0084693452 0.0040689323 0.0064256379 0.0105191365 0.005719199
lr_network <- readRDS('./NicheNet/lr_network_human_21122021.rds')
lr_network = lr_network %>% distinct(from, to)
head(lr_network)
# A tibble: 6 × 2
# from to
# <chr> <chr>
#1 A2M MMP2
#2 A2M MMP9
#3 A2M LRP1
#4 A2M KLK3
#5 AANAT MTNR1A
#6 AANAT MTNR1B
weighted_networks <- readRDS('./NicheNet/weighted_networks_nsga2r_final.rds')
weighted_networks_lr = weighted_networks$lr_sig %>% inner_join(lr_network, by = c("from","to"))
head(weighted_networks$lr_sig) # interactions and their weights in the ligand-receptor + signaling network
# A tibble: 6 × 3
# from to weight
# <chr> <chr> <dbl>
#1 A-GAMMA3'E ACTG1P11 0.100
#2 A-GAMMA3'E AXIN2 0.0869
#3 A-GAMMA3'E BUB1B-PAK6 0.0932
#4 A-GAMMA3'E CEACAM7 0.0793
#5 A-GAMMA3'E CHRNA1 0.0901
#6 A-GAMMA3'E DTX2P1 0.0976
head(weighted_networks$gr) # interactions and their weights in the gene regulatory network
# A tibble: 6 × 3
# from to weight
# <chr> <chr> <dbl>
#1 A1BG A2M 0.165
#2 AAAS GFAP 0.0906
#3 AADAC CTAG1B 0.104
#4 AADAC CYP3A4 0.177
#5 AADAC DIRAS3 0.0936
#6 AADAC IRF8 0.0892
四、执行NicheNet分析
1. Define a “sender/niche” cell population and a “receiver/target” cell population present in your expression data and determine which genes are expressed in both populations
## receiver
Idents(seuratObj) <- 'celltype'
receiver = "CD4+ pro.T"
expressed_genes_receiver = get_expressed_genes(receiver, seuratObj, pct = 0.10)
background_expressed_genes = expressed_genes_receiver %>% .[. %in% rownames(ligand_target_matrix)]
## sender
sender_celltypes = c("CD4+ T","CD8+ CTL", "B cells", "NK T", "Myeloid cells")
list_expressed_genes_sender = sender_celltypes %>% unique() %>% lapply(get_expressed_genes, seuratObj, 0.10) # lapply to get the expressed genes of every sender cell type separately here
expressed_genes_sender = list_expressed_genes_sender %>% unlist() %>% unique()
2. Define a gene set of interest: these are the genes in the “receiver/target” cell population that are potentially affected by ligands expressed by interacting cells (e.g. genes differentially expressed upon cell-cell interaction)
seurat_obj_receiver= subset(seuratObj, idents = receiver)
seurat_obj_receiver = SetIdent(seurat_obj_receiver, value = seurat_obj_receiver[["Identity"]])
condition_oi = "ATL"
condition_reference = "ACs"
DE_table_receiver = FindMarkers(object = seurat_obj_receiver, ident.1 = condition_oi, ident.2 = condition_reference, min.pct = 0.10) %>% rownames_to_column("gene")
geneset_oi = DE_table_receiver %>% filter(p_val_adj <= 0.05 & abs(avg_log2FC) >= 0.25) %>% pull(gene)
geneset_oi = geneset_oi %>% .[. %in% rownames(ligand_target_matrix)]
3. Define a set of potential ligands: these are ligands that are expressed by the “sender/niche” cell population and bind a (putative) receptor expressed by the “receiver/target” population
ligands = lr_network %>% pull(from) %>% unique()
receptors = lr_network %>% pull(to) %>% unique()
expressed_ligands = intersect(ligands,expressed_genes_sender)
expressed_receptors = intersect(receptors,expressed_genes_receiver)
potential_ligands = lr_network %>% filter(from %in% expressed_ligands & to %in% expressed_receptors) %>% pull(from) %>% unique()
4. Perform NicheNet ligand activity analysis:rank the potential ligands based on the presence of their target genes in the gene set of interest (compared to the background set of genes)
ligand_activities = predict_ligand_activities(geneset = geneset_oi, background_expressed_genes = background_expressed_genes, ligand_target_matrix = ligand_target_matrix, potential_ligands = potential_ligands)
#The different ligand activity measures (auroc, aupr, pearson correlation coefficient) are a measure for how well a ligand can predict the observed differentially expressed genes compared to the background of expressed genes.
ligand_activities = ligand_activities %>% arrange(-aupr_corrected) %>% mutate(rank = rank(desc(aupr_corrected)))
ligand_activities
# A tibble: 141 × 6
test_ligand auroc aupr aupr_corrected pearson rank
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FCER2 0.524 0.0290 0.0177 0.0405 1
2 CLEC2D 0.527 0.0278 0.0165 0.00547 2
3 HMGB1 0.508 0.0197 0.00844 0.0362 3
4 IFITM1 0.541 0.0196 0.00828 0.0277 4
5 MICB 0.531 0.0193 0.00804 0.0178 5
6 IL1B 0.513 0.0183 0.00697 0.0409 6
best_upstream_ligands = ligand_activities %>% top_n(30, aupr_corrected) %>% arrange(-aupr_corrected) %>% pull(test_ligand) %>% unique()
DotPlot(seuratObj, features = best_upstream_ligands %>% rev(), cols = "RdYlBu") + RotatedAxis()
5、Infer receptors and top-predicted target genes of ligands that are top-ranked in the ligand activity analysis
#Active target gene inference
active_ligand_target_links_df = best_upstream_ligands %>% lapply(get_weighted_ligand_target_links,geneset = geneset_oi, ligand_target_matrix = ligand_target_matrix, n = 200) %>% bind_rows() %>% drop_na()
active_ligand_target_links = prepare_ligand_target_visualization(ligand_target_df = active_ligand_target_links_df, ligand_target_matrix = ligand_target_matrix, cutoff = 0.33)
order_ligands = intersect(best_upstream_ligands, colnames(active_ligand_target_links)) %>% rev() %>% make.names()
order_targets = active_ligand_target_links_df$target %>% unique() %>% intersect(rownames(active_ligand_target_links)) %>% make.names()
rownames(active_ligand_target_links) = rownames(active_ligand_target_links) %>% make.names() # make.names() for heatmap visualization of genes like H2-T23
colnames(active_ligand_target_links) = colnames(active_ligand_target_links) %>% make.names() # make.names() for heatmap visualization of genes like H2-T23
vis_ligand_target = active_ligand_target_links[order_targets,order_ligands] %>% t()
p_ligand_target_network = vis_ligand_target %>% make_heatmap_ggplot("Prioritized ligands","Predicted target genes", color = "purple",legend_position = "top", x_axis_position = "top",legend_title = "Regulatory potential") + theme(axis.text.x = element_text(face = "italic")) + scale_fill_gradient2(low = "whitesmoke", high = "purple", breaks = c(0,0.0045,0.0090))
p_ligand_target_network
#Receptors of top-ranked ligands
lr_network_top = lr_network %>% filter(from %in% best_upstream_ligands & to %in% expressed_receptors) %>% distinct(from,to)
best_upstream_receptors = lr_network_top %>% pull(to) %>% unique()
lr_network_top_df_large = weighted_networks_lr %>% filter(from %in% best_upstream_ligands & to %in% best_upstream_receptors)
lr_network_top_df = lr_network_top_df_large %>% spread("from","weight",fill = 0)
lr_network_top_matrix = lr_network_top_df %>% select(-to) %>% as.matrix() %>% magrittr::set_rownames(lr_network_top_df$to)
dist_receptors = dist(lr_network_top_matrix, method = "binary")
hclust_receptors = hclust(dist_receptors, method = "ward.D2")
order_receptors = hclust_receptors$labels[hclust_receptors$order]
dist_ligands = dist(lr_network_top_matrix %>% t(), method = "binary")
hclust_ligands = hclust(dist_ligands, method = "ward.D2")
order_ligands_receptor = hclust_ligands$labels[hclust_ligands$order]
order_receptors = order_receptors %>% intersect(rownames(lr_network_top_matrix))
order_ligands_receptor = order_ligands_receptor %>% intersect(colnames(lr_network_top_matrix))
vis_ligand_receptor_network = lr_network_top_matrix[order_receptors, order_ligands_receptor]
rownames(vis_ligand_receptor_network) = order_receptors %>% make.names()
colnames(vis_ligand_receptor_network) = order_ligands_receptor %>% make.names()
p_ligand_receptor_network = vis_ligand_receptor_network %>% t() %>% make_heatmap_ggplot("Ligands","Receptors", color = "mediumvioletred", x_axis_position = "top",legend_title = "Prior interaction potential")
p_ligand_receptor_network
6.Add log fold change information of ligands from sender cells
# DE analysis for each sender cell type
# this uses a new nichenetr function - reinstall nichenetr if necessary!
DE_table_all = Idents(seuratObj) %>% levels() %>% intersect(sender_celltypes) %>% lapply(get_lfc_celltype, seurat_obj = seuratObj, condition_colname = "Identity", condition_oi = condition_oi, condition_reference = condition_reference, expression_pct = 0.10, celltype_col = NULL) %>% reduce(full_join) # use this if cell type labels are the identities of your Seurat object -- if not: indicate the celltype_col properly
DE_table_all[is.na(DE_table_all)] = 0
# Combine ligand activities with DE information
ligand_activities_de = ligand_activities %>% select(test_ligand, pearson) %>% rename(ligand = test_ligand) %>% left_join(DE_table_all %>% rename(ligand = gene))
ligand_activities_de[is.na(ligand_activities_de)] = 0
# make LFC heatmap
lfc_matrix = ligand_activities_de %>% select(-ligand, -pearson) %>% as.matrix() %>% magrittr::set_rownames(ligand_activities_de$ligand)
rownames(lfc_matrix) = rownames(lfc_matrix) %>% make.names()
order_ligands = order_ligands[order_ligands %in% rownames(lfc_matrix)]
vis_ligand_lfc = lfc_matrix[order_ligands,]
colnames(vis_ligand_lfc) = vis_ligand_lfc %>% colnames() %>% make.names()
p_ligand_lfc = vis_ligand_lfc %>% make_threecolor_heatmap_ggplot("Prioritized ligands","LFC in Sender", low_color = "midnightblue",mid_color = "white", mid = median(vis_ligand_lfc), high_color = "red",legend_position = "top", x_axis_position = "top", legend_title = "LFC") + theme(axis.text.y = element_text(face = "italic"))
p_ligand_lfc
OK,今天的学习先到这里,NicheNet方法的可操作性还有一些没整理,后面会补更。如有错误之处,感谢指正!谢谢☺️