10X单细胞空间通讯分析之最新版cellphoneDB(v4)解读

作者,Evil Genius

前不久刚给学员上了一节关于细胞通讯的课程,也发现了很多软件的更新之处,在这里给大家分享一下cellphoneDB v4.0最新更新的内容。

考虑空间位置的通讯分析手段---CellphoneDB(V3.0)

安装上的不同,现在cellphoneDB完全封装成一个linux运行命令,conda直接安装就可以。

conda create -n cpdb python=3.8

source activate cpdb

pip install cellphonedb

分析方法上的更新(三种方法选择)

  • METHOD 1 simple analysis (>= v1): Here, no statistical analysis is performed. CellphoneDB will output the mean for all the interactions for each cell type pair combination. Note that CellphoneDB will report the means only if all the gene members of the interactions are expressed by at least a fraction of cells in a cell type (threshold). If the condition threshold is not met, the interaction will be ignored in the corresponding cell type pairs.
如果采用方法1,那么直接会选出所有的表达配受体的细胞类型pair
means, deconvoluted = cpdb_analysis_method.call(
         cpdb_file_path = cellphonedb.zip,
         meta_file_path = test_meta.txt,
         counts_file_path = test_counts.h5ad,
         counts_data = 'hgnc_symbol',
         output_path = out_path)

结果只包含受配体对的means.csv and deconvoluted.csv

  • METHOD 2 statistical_analysis (>= v1): This is a statistical analysis that evaluates for significance all the interactions that can potentially occur in your dataset: i.e. between ALL the potential cell type pairs. Here, CellphoneDB uses empirical shuffling to calculate which ligand–receptor pairs display significant cell-type specificity. Specifically, it estimates a null distribution of the mean of the average ligand and receptor expression in the interacting clusters by randomly permuting the cluster labels of all cells. The P value for the likelihood of cell-type specificity of a given receptor–ligand complex is calculated on the basis of the proportion of the means that are as high as or higher than the actual mean.

如果采用方法2,那么就会对配受体对进行假设检验

  • Only receptors and ligands expressed in more than a user-specified threshold percentage of the cells in the specific cluster (threshold default is 0.1) are tested and will get a mean value in the significant.txt output.
  • For the multi-subunit heteromeric complexes, we require that:
    1、 all subunits of the complex are expressed by a proportion of cells (threshold), and then
    2、 We use the member of the complex with the minimum expression to compute the interaction means and perform the random shuffling.
然后,对所有细胞类型进行两两比较。首先,随机排列所有细胞的cluster标签(默认为1000),并确定cluster中平均受体表达水平的平均值和相互作用cluster中平均配体表达水平的平均值。对于两种细胞类型之间的每个成对比较中的每个受体配体对,这产生零分布。通过计算等于或高于实际平均值的平均值的比例,获得给定受体-配体复合物细胞类型特异性可能性的p值。然后,根据显著对的数量优先考虑细胞类型之间高度丰富的相互作用,以便可以手动选择生物学上相关的相互作用
from cellphonedb.src.core.methods import cpdb_statistical_analysis_method

deconvoluted, means, pvalues, significant_means = cpdb_statistical_analysis_method.call(
        cpdb_file_path = cellphonedb.zip,
        meta_file_path = test_meta.txt,
        counts_file_path = test_counts.h5ad,
        counts_data = 'hgnc_symbol',
        output_path = out_path)
  • METHOD 3 degs_analysis (>= v3): This method is proposed as an alternative to the statistical inference approach. This approach allows the user to design more complex comparisons to retrieve interactions specific to a cell type of interest. This is particularly relevant when your research question goes beyond comparing “one” cell type vs “the rest”. Examples of alternative contrasts are hierarchical comparisons (e.g. you are interested in a specific lineage, such epithelial cells, and wish to identify the genes changing their expression within this lineage) or comparing disease vs control (e.g. you wish to identify upregulated genes in disease T cells by comparing them against control T cells). For this CellphoneDB method (cpdb_degs_analysis_method), the user provides an input file (test_DEGs.txt in the command below) indicating which genes are relevant for a cell type (for example, marker genes or significantly upregulated genes resulting from a differential expression analysis (DEG)). CellphoneDB will select interactions where:

  • 1、 all the genes in the interaction are expressed in the corresponding cell type by more than 10% of cells (threshold = 0.1) and

  • 2、 at least one gene-cell type pair is in the provided DEG.tsv file.

from cellphonedb.src.core.methods import cpdb_degs_analysis_method

deconvoluted, means, relevant_interactions, significant_means = cpdb_degs_analysis_method.call(
         cpdb_file_path = cellphonedb.zip,
         meta_file_path = test_meta.txt,
         counts_file_path = test_counts.h5ad,
         degs_file_path = degs_file.txt,
         counts_data = 'hgnc_symbol',
         threshold = 0.1,
         output_path = out_path)
这种方法可以自由地设计基因表达比较,以更好地匹配研究问题。使用方法2,零假设(和背景分布)考虑数据集中的所有细胞类型,并执行“一个”细胞类型与“其余”细胞类型的比较。然而,分析可能希望使用不同的方法来更好地反映研究情况。下面是一些例子:

分析需要考虑技术批次或生物协变量。在这里,更好的方法是依赖包含这些混杂因素的差异表达方法,并直接向CellphoneDB提供结果。

**对特定谱系中的特异性感兴趣,并希望执行分层差异表达分析(例如,对特定谱系感兴趣,例如上皮细胞,并希望识别在该上皮谱系中改变其表达的基因;研究问题:与上皮细胞b相比,上皮细胞a中哪些相互作用被上调?)

希望在疾病与控制方式中比较特定群体(例如,通过将疾病T细胞与对照T细胞进行比较来识别疾病T细胞中的上调基因;研究问题:疾病t细胞上调了哪些相互作用?)

包含空间信息,可以参考考虑空间位置的通讯分析手段---CellphoneDB(V3.0)

结果解读

Output files

All files (except “deconvoluted.txt”) follow the same structure: rows depict interacting proteins while columns represent interacting cell type pairs.

  • The “means.txt” file contains mean values for each ligand-receptor interaction (rows) for each cell-cell interaction pair (columns).

  • The “pvalues.txt” contains the P values for the likelihood of cell-type specificity of a given receptor–ligand complex (rows) in each cell-cell interaction pair (columns), resulting from the statistical_analysis.

  • The “significant_means.txt” contains the mean expression (same as “means.txt”) of the significant receptor–ligand complex only. This is the result of crossing “means.csv” and “pvalues.txt”.

  • The “relevant_interactions.txt” contains a binary matrix indicating if the interaction is relevant (1) or not (0). An interaction is classified as relevant if a gene is a DEG in a cluster/cell type (information provided by the user in the DEG.tsv file) and all the participant genes are expressed. Alternatively, the value is set to 0. This file is specific to degs_analysis. Each row corresponds to a ligand-receptor interaction, while each column corresponds to a cell-cell interacting pair.

  • The “deconvoluted.txt” file gives additional information for each of the interacting partners. This is important as some of the interacting partners are heteromers. In other words, multiple molecules have to be expressed in the same cluster in order for the interacting partner to be functional.

See below the meaning of each column in the outputs:

P-value (pvalues.txt), Mean (means.txt), Significant mean (significant_means.txt) and Relevant interactions (relevant_interactions.txt)
  • id_cp_interaction: Unique CellphoneDB identifier for each interaction stored in the database.
  • interacting_pair: Name of the interacting pairs separated by “|”.
  • partner A or B: Identifier for the first interacting partner (A) or the second (B). It could be: UniProt (prefix simple:) or complex (prefix complex:)
  • gene A or B: Gene identifier for the first interacting partner (A) or the second (B). The identifier will depend on the input user list.
  • secreted: True if one of the partners is secreted.
  • Receptor A or B: True if the first interacting partner (A) or the second (B) is annotated as a receptor in our database.
  • annotation_strategy: Curated if the interaction was annotated by the CellphoneDB developers. Otherwise, the name of the database where the interaction has been downloaded from.
  • is_integrin: True if one of the partners is integrin.
  • rank: Total number of significant p-values for each interaction divided by the number of cell type-cell type comparisons. (Only in significant_means.txt)
  • means: Mean values for all the interacting partners: mean value refers to the total mean of the individual partner average expression values in the corresponding interacting pairs of cell types. If one of the mean values is 0, then the total mean is set to 0. (Only in means.txt)
  • p.values: p-values for all the interacting partners: p.value refers to the enrichment of the interacting ligand-receptor pair in each of the interacting pairs of cell types. (Only in pvalues.txt)
  • significant_mean: Significant mean calculation for all the interacting partners. If p.value < 0.05, the value will be the mean. Alternatively, the value is set to 0. (Only in significant_means.txt)
  • relevant_interactions: Indicates if the interaction is relevant (1) or not (0). If a gene in the interaction is a DEG (i.e. a gene in the DEG.tsv file), and all the participant genes are expressed, the interaction will be classified as relevant. Alternatively, the value is set to 0. ( Only in relevant_interactions.txt)

Again, remember that the interactions are not symmetric. It is not the same IL12-IL12 receptor for clusterA clusterB (i.e. receptor is in clusterB) that IL12-IL12 receptor for clusterB clusterA (i.e. receptor is in clusterA).

Deconvoluted (deconvoluted.txt)

  • gene_name: Gene identifier for one of the subunits that are participating in the interaction defined in the “means.csv” file. The identifier will depend on the input of the user list.

  • uniprot: UniProt identifier for one of the subunits that are participating in the interaction defined in the “means.csv” file.

  • is_complex: True if the subunit is part of a complex. Single if it is not, complex if it is.

  • protein_name: Protein name for one of the subunits that are participating in the interaction defined in the “means.csv” file.

  • complex_name: Complex name if the subunit is part of a complex. Empty if not.

  • id_cp_interaction: Unique CellphoneDB identifier for each of the interactions stored in the database.

  • mean: Mean expression of the corresponding gene in each cluster.

Interpreting the outputs

How to read and interpret the results?

The key files are significant_means.txt (for statistical_analysis) or relevant_interactions.txt (for degs_analysis), see below. When interpreting the results, we recommend you first define your questions of interest. Next, focus on specific cell type pairs and manually review the interactions prioritising those with lower p-value and/or higher mean expression. For graphical representation we recommend @zktuong repository: ktplots in R and ktplotspy in python.

CellphoneDB output is high-throughput. CellphoneDB provides all cell-cell interactions that may potentially occur in your dataset, given the expression of the cells. The size of the output may be overwhelming, but if you apply some rationale (which will depend on the design of your experiment and your biological question), you will be able to narrow it down to a few candidate interactions. The new method degs_analysis will allow you to perform a more tailored analysis towards specific cell-types or conditions, while the option microenvs will allow you to restrict the combinations of cell-type pairs to test.

It may be that not all of the cell-types of your input dataset co-appear in time and space. Cell types that do not co-appear in time and space will not interact. For example, you might have cells coming from different in vitro systems, different developmental stages or disease and control conditions. Use this prior information to restrict and ignore infeasible cell-type combinations from the outputs (i.e., columns) as well as their associated interactions (i.e. rows). You can restrict the analysis to feasible cell-type combinations using the option microenvs. Here you can input a two columns file indicating which cell type is in which spatiotemporal microenvironment. CellphoneDB will use this information to define possible pairs of interacting cells (i.e. pairs of clusters co-existing in a microenvironment) ignoring the rest of combinations.

最重要的是结果文件采用受配体对进行展示,而不是通常的配受体对。

简单记录一下,生活很好,有你更好
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 194,088评论 5 459
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 81,715评论 2 371
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 141,361评论 0 319
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 52,099评论 1 263
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 60,987评论 4 355
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 46,063评论 1 272
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 36,486评论 3 381
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 35,175评论 0 253
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 39,440评论 1 290
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 34,518评论 2 309
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 36,305评论 1 326
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 32,190评论 3 312
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 37,550评论 3 298
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 28,880评论 0 17
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 30,152评论 1 250
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 41,451评论 2 341
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 40,637评论 2 335

推荐阅读更多精彩内容