前言:
大家好,今天我们一起学习一下如何利用CellphoneDB对单细胞分析结果进行互作分析,当前CellphoneDB的最新版本是4.0.0,可以对单细胞中不同细胞群水平上的互作基因对进行评估。我们在这里利用该版本进行实操。
软件安装:
# 创建、切换环境,
conda create -n cellphonedb python=3.8 #python3.8支持CellphoneDB 4.0。
source activate cellphonedb
# 安装需要软件
pip install cellphonedb -i https://pypi.tuna.tsinghua.edu.cn/simple
# 检查版本信息
>>> import cellphonedb
>>> cellphonedb.version
'4.0.0'
输入数据说明:
我们在Cellphonedb分析过程中,需要用到官方提供的数据库,可以自行下载:
Wget https://github.com/ventolab/cellphonedb-data # 当前支持人和小鼠。
# 定义数据库路径
cellphonedbpath = "/data/xxx/cellphonedb/cellphonedb.zip"
outdir = "/data/xxx/cellphonedb_out/" # 设置输出文件路径:
输入数据我们可以去下载cellphonedb官方示例,这篇文章我们使用的数据是自己准备的,准备示例如下:
pbmc = readRDS(raw.rds)
write.table(as.matrix(pbmc@assays$RNA@data),'cellphonedb_count.txt', sep='\t', quote=F)
meta_data <- cbind(rownames(pbmc@meta.data), pbmc@meta.data[,'cell_type', drop=F])
meta_data <- as.matrix(meta_data)
meta_data[is.na(meta_data)] = "Unkown" # 细胞类型为NA的转换成 "Unkown" 字符串
write.table(meta_data,'cellphonedb_meta.txt', sep='\t', quote=F, row.names=F)
如果项目类型为空间转录组,我们需要额外准备微环境文件:
# 官方给出来的数据结构,第一列是细胞类型,第二列是空间微环境。
cell_type microenviroment
epi_Ciliated Proliferative
epi_Pre-ciliated Proliferative
epi_SOX9_LGR5 Proliferative
epi_SOX9_prolif Proliferative
epi_SOX9 Proliferative
FibroblasteS Proliferative
Lymphoid Proliferative
Myeloid Proliferative
FibroblastC7 Proliferative
软件说明:
Cellphonedb提供了三种计算模式,分别对应不同的分析需求。
- 模式一:cpdb_analysis_method
CellphoneDB在此模式下不进行统计分析,单纯输出细胞群水平下受配体对的平均值。既然是均值,当然是需要保证在一种细胞类型中,有多个细胞同时表达该基因才可以进行均值计算,有多少细胞表达该基因,就是由阈值参数进行规定了。否则,该配受体对是会被cellphonedb进行过滤掉的。
from cellphonedb.src.core.methods import cpdb_analysis_method
means, deconvoluted = cpdb_analysis_method.call(
cpdb_file_path = cellphonedbpath,
meta_file_path = "/data/xxx/cellphonedb/cellphonedb_meta.csv",
counts_file_path = "/data/xxx/cellphonedb/cellphonedb_count.csv", #也可以输入h5文件。
counts_data = 'hgnc_symbol',
output_path = outdir )
文件输出:
simple_analysis_deconvoluted_result.txt
simple_analysis_means_result.txt
- 模式二:statistical_analysis
CellphoneDB在此模式下进行统计分析,用于评估数据集中可能发生的所有相互作用的显著性。CellphoneDB 使用 empirical shuffling 来计算配体-受体对在相应细胞类型中的显著性P值。简单而言,它通过随机排列所有细胞的簇标记来估计相互作用簇中平均配体和受体表达的平均值的零分布。然后根据与实际均值一样高或更高的比例计算其P值。
from cellphonedb.src.core.methods import cpdb_statistical_analysis_method
deconvoluted, means, pvalues, significant_means = cpdb_statistical_analysis_method.call(
cpdb_file_path = cellphonedbpath,
meta_file_path = "/data/xxx/cellphonedb/cellphonedb_meta.csv",
counts_file_path = "/data/xxx/cellphonedb/cellphonedb_count.csv",
counts_data = 'hgnc_symbol',
output_path = outdir )
文件输出:
statistical_analysis_deconvoluted.txt
statistical_analysis_means.txt
statistical_analysis_pvalues.txt
statistical_analysis_significant_means.txt
- 模式三:degs_analysis
CellphoneDB在此模式下允许用户提供一个输入文件(DEGlist2.txt:第一列细胞类型,第二列基因名),cellphonedb将选择相互作用对中的所有基因在相应的细胞类型中由超过 20%的细胞表达(阈值=0.2),并且至少一个基因-细胞类型对在所提供的 DEG.tsv 文件中。
# DEGlist2.txt 格式要求
Macrophage TSPAN6
Macrophage DPM1
Macrophage SCYL3
Macrophage C1orf112
from cellphonedb.src.core.methods import cpdb_degs_analysis_method
deconvoluted, means, relevant_interactions, significant_means = cpdb_degs_analysis_method.call(
cpdb_file_path = cellphonedbpath,
meta_file_path = "/data1/ZY/cellphonedb/cellphonedb_meta.csv",
counts_file_path = "/data1/ZY/cellphonedb/cellphonedb_count.csv",
degs_file_path = "/data1/ZY/cellphonedb/DEGlist2.txt",
counts_data = 'hgnc_symbol',
threshold = 0.2,
output_path = outdir)
degs_analysis_deconvoluted_result.txt
degs_analysis_means_result.txt
degs_analysis_relevant_interactions_result.txt
degs_analysis_significant_means.txt
数据可视化
CellphoneDB 官网推荐两种可视化方案,分别为:ktplots
、tplotspy
(python
),本推文我们以 python 包 ktplotspy
为例进行演示:
import os
import anndata as ad
import pandas as pd
import ktplotspy as kpy
import matplotlib.pyplot as plt
adata = pd.read_csv("/data/xxx/cellphonedb/cellphonedb_meta.csv")
xdata = pd.read_csv("/data/xxx/cellphonedb/cellphonedb_count.csv")
# 生成anndata对象
andata = ad.AnnData(xdata.T,obs=adata)
# andata = ad.read_h5ad("test.h5ad") # andata也可以直接读取h5文件。
# 以下三个文件均为以上cellphonedb的输出文件。
means = pd.read_csv("/data/xxx/cellphonedb/statistical_analysis_means_04_07_2023_02:52:43.txt", sep="\t")
pvals = pd.read_csv("/data/xxx/cellphonedb/statistical_analysis_pvalues_04_07_2023_02:52:43.txt", sep="\t")
decon = pd.read_csv("/data/xxx/cellphonedb/statistical_analysis_deconvoluted_04_07_2023_02:52:43.txt", sep="\t")
- 绘制热图。通过显著受配体体对情况比较不同的细胞群之间的相关性。
kpy.plot_cpdb_heatmap(
adata=andata,
pvals=pvals,
celltype_key="celltype",
figsize=(5, 5),
symmetrical=True,
title="Sum of significant interactions")
plt.savefig("plot_cpdb_heatmap.pdf")
- 点图绘制,可以进行基因和细胞的选择。
g = kpy.plot_cpdb(
adata=andata,
cell_type1="Monocyte",
cell_type2=".", # all cell-types
means=means,
pvals=pvals,
celltype_key="celltype",
genes=["MTMR7","SLC7A2","ARF5","SARM1","POLDIP2","PLXND1","AK2","CD38","FKBP4","KDM1A","RBM6","CAMKK1","RECQL","VPS50","HSPB6","ARHGAP33","NDUFAB1","PDK4","SLC22A16","ZMYND10","ARX","SLC25A13","ST7","CDC27","SLC4A1","CALCR","HCCS","DVL2","PRSS22","UPF1","SKAP2","SLC25A5","MCUB","POLR2J","DHX33","MEOX1","THSD7A","LIG3","RPAP3","ACSM3","REXO5","CIAPIN1","SPPL2B","FAM214B","COPZ2","PRKAR2B","MSL3","CREBBP","TSPOAP1","MPO","GCFC2","WDR54","CROT","ABCB4","KMT2E","RHBDD2","IBTK","ZNF195"],#, "DPM1"],
figsize=(10, 2),
title="interacting interactions!",
# gene_family="chemokines", 支持筛选基因家族。
# highlight_size=1, 绘制全部的受配体。
)
g.save(filename = 'plot_cpdb.pdf', height=5, width=5, units = 'in', dpi=1000)
# 该图源码中使用的plotnine进行绘制,返回对象后,使用save方法进行图片保存。
- 弦图绘制 展示不同细胞群之间的配体受体对。
circle = kpy.plot_cpdb_chord(
adata=andata,
cell_type1="Monocyte",
cell_type2=".",
means=means,
pvals=pvals,
deconvoluted=decon,
celltype_key="celltype",
#genes=["VPS50","HSPB6","ARHGAP33","NDUFAB1","PDK4","SLC22A16"],
figsize=(6, 6),
'''
face_col_dict={ # 定义环的颜色。
"B cell": "red",
"NK cell": "blue",
"CD4T cell": "black",
"pDC": "brown",
"Neutrophil": "grey",
"Mast cell": "orange",
"NKT cell": "pink",
"CD8T cell": "cyan",
},
edge_cmap 定义边颜色
'''
)
circle.save(file_name = 'plot_cpdb_chord',format='pdf', dpi=300)
# 该图返回的是Gcircle对象,save方法的参数有所改变,请注意。
- 补充seurat对象保存h5的方法
library(SeuratDisk)
library(Seurat)
SaveH5Seurat(seurat.obj,filename="test.h5seurat", overwrite = TRUE)
Convert("test.h5seurat", dest = "h5ad", overwrite = TRUE)