10X单细胞空间联合分析回顾之squidpy

新的一年,新的开始,既然选择了前方,只顾风雨兼程。

这一次,我们来继续回顾一下空间转录组的分析软件-------squidpy,当然了,我们主要关注10X空间转录组的分析内容。

Import packages & data

import scanpy as sc
import anndata as ad
import squidpy as sq

import numpy as np
import pandas as pd

sc.logging.print_header()
print(f"squidpy=={sq.__version__}")

# load the pre-processed dataset
img = sq.datasets.visium_hne_image()
adata = sq.datasets.visium_hne_adata()
First, let’s visualize cluster annotation in spatial context with [scanpy.pl.spatial()].(看来空间转录组是提前分析过的,我们要关注一下这里的空间注释)
sc.pl.spatial(adata, color="cluster")
图片.png

Image features

Visium 数据集包含用于基因提取的组织的高分辨率图像。 使用函数 squidpy.im.calculate_image_features() 可以计算每个 Visium 点的图像特征并在 adata 中创建 obs x features矩阵,然后可以与 obs x gene基因表达矩阵一起分析。

通过提取图像特征,我们的目标是获得与基因表达值相似和互补的信息。 例如,在具有形态不同的两种不同细胞类型的组织的情况下存在类似信息。 这样的细胞类型信息然后包含在基因表达值和组织图像特征中。

Squidpy 包含几个特征提取器和一个灵活的计算不同尺度和大小特征的管道。 有几个关于如何使用 squidpy.im.calculate_image_features() 的详细示例。 提取图像特征为学习更多信息提供了一个很好的起点。 (这个我们放在后面)

# calculate features for different scales (higher value means more context)
for scale in [1.0, 2.0]:
    feature_name = f"features_summary_scale{scale}"
    sq.im.calculate_image_features(
        adata,
        img.compute(),
        features="summary",
        key_added=feature_name,
        n_jobs=4,
        scale=scale,
    )


# combine features in one dataframe
adata.obsm["features"] = pd.concat(
    [adata.obsm[f] for f in adata.obsm.keys() if "features_summary" in f], axis="columns"
)
# make sure that we have no duplicated feature names in the combined table
adata.obsm["features"].columns = ad.utils.make_index_unique(adata.obsm["features"].columns)

可以使用提取的图像特征来计算新的cluster注释。 这可能有助于根据图像形态深入了解点之间的相似性

# helper function returning a clustering
def cluster_features(features: pd.DataFrame, like=None) -> pd.Series:
    """
    Calculate leiden clustering of features.

    Specify filter of features using `like`.
    """
    # filter features
    if like is not None:
        features = features.filter(like=like)
    # create temporary adata to calculate the clustering
    adata = ad.AnnData(features)
    # important - feature values are not scaled, so need to scale them before PCA
    sc.pp.scale(adata)
    # calculate leiden clustering
    sc.pp.pca(adata, n_comps=min(10, features.shape[1] - 1))
    sc.pp.neighbors(adata)
    sc.tl.leiden(adata)

    return adata.obs["leiden"]


# calculate feature clusters
adata.obs["features_cluster"] = cluster_features(adata.obsm["features"], like="summary")

# compare feature and gene clusters
sc.set_figure_params(facecolor="white", figsize=(8, 8))
sc.pl.spatial(adata, color=["features_cluster", "cluster"])
图片.png

比较基因和特征cluster,我们注意到在某些区域,它们看起来非常相似,比如cluster Fiber_tract,或者海马周围的cluster似乎被图像特征空间中的cluster粗略地概括了。 在其他情况下,特征cluster看起来不同,比如在皮层中,基因cluster显示了皮层的分层结构,而特征cluster似乎显示了皮层的不同区域。

这只是对图像特征的简单比较分析,请注意,还可以使用图像特征来例如 通过计算共享邻居图(例如在两个特征空间上的连接 PCA)来计算公共图像和基因聚类

Spatial statistics and graph analysis

与其他空间数据类似,可以利用 Visium 数据中的空间和图形统计来研究空间组织

Neighborhood enrichment(邻居富集)

计算邻域富集可以帮助我们识别在整个组织中共享公共邻域结构的spots clusters。 可以使用以下函数计算这样的分数:squidpy.gr.nhood_enrichment()。 简而言之,它是聚类空间接近度的丰富度得分:如果属于两个不同聚类的点通常彼此靠近,那么它们将具有高得分,可以定义为被 enriched。 另一方面,如果它们相距很远,因此很少是邻域,则分数将很低,可以将它们定义为depleted。 此分数基于置换检验,您可以使用 n_perms 参数(默认为 1000)设置。

由于该函数适用于连接矩阵,因此我们也需要计算它。 这可以通过 squidpy.gr.spatial_neighbors() 来完成。 有关此功能如何工作的更多详细信息,请参阅构建空间邻居图(这个文章后面会详细分享)。

Finally, we’ll directly visualize the results with squidpy.pl.nhood_enrichment().

sq.gr.spatial_neighbors(adata)
sq.gr.nhood_enrichment(adata, cluster_key="cluster")
sq.pl.nhood_enrichment(adata, cluster_key="cluster")
图片.png

Given the spatial organization of the mouse brain coronal section, not surprisingly we find high neighborhood enrichment the Hippocampus region: Pyramidal_layer_dentate_gyrus and Pyramidal_layer clusters seems to be often neighbors with the larger Hippocampus cluster.

Co-occurrence across spatial dimensions(共定位)

除了邻居富集分数,还可以在空间维度上可视化cluster共现。 这是对上述分析的类似分析,但它不是对连接矩阵进行操作,而是对原始空间坐标进行操作。 共现分数定义为:

图片.png

where p(exp/cond) is the conditional probability of observing a cluster exp conditioned on the presence of a cluster cond, whereas p(exp) is the probability of observing exp in the radius size of interest.该分数是通过增加组织中每个观察(即此处的点)周围的半径大小来计算的

We are gonna compute such score with squidpy.gr.co_occurrence() and set the cluster annotation for the conditional probability with the argument clusters. Then, we visualize the results with squidpy.pl.co_occurrence()
sq.gr.co_occurrence(adata, cluster_key="cluster")
sq.pl.co_occurrence(
    adata,
    cluster_key="cluster",
    clusters="Hippocampus",
    figsize=(8, 4),
)
图片.png
The result largely recapitulates the previous analysis: the Pyramidal_layer cluster seem to co-occur at short distances with the larger Hippocampus cluster. It should be noted that the distance units are given in pixels of the Visium source_image, and corresponds to the same unit of the spatial coordinates saved in adata.obsm['spatial'].

Ligand-receptor interaction analysis

将继续分析显示与空间分子数据分析非常相关的几个特征级方法。例如,在成对cluster共现进行量化之后,可能对寻找可能驱动细胞通信的分子实例感兴趣。这自然转化为配体-受体相互作用分析。在 Squidpy 中,提供了一种快速重新实现的流行方法 CellPhoneDB,并使用流行的数据库扩展了其带注释的配体-受体相互作用对的数据库。可以使用 squidpy.gr.ligrec() 对所有cluster对和所有基因进行分析。此外,我们将直接可视化结果,过滤掉低表达的基因(使用 mean_range 参数)并增加调整后的 p 值的阈值(使用 alpha 参数)。

sq.gr.ligrec(
    adata,
    n_perms=100,
    cluster_key="cluster",
)
sq.pl.ligrec(
    adata,
    cluster_key="cluster",
    source_groups="Hippocampus",
    target_groups=["Pyramidal_layer", "Pyramidal_layer_dentate_gyrus"],
    means_range=(3, np.inf),
    alpha=1e-4,
    swap_axes=True,
)
图片.png

点图可视化提供了一组有趣的候选配体-受体注释,可能涉及海马中的细胞相互作用。 例如,更精细的分析是将这些结果与去卷积方法的结果相结合,以了解该组织区域中存在的单细胞类型的比例是多少。

Spatially variable genes with Moran’s I(空间高变基因)

最后,我们可能对寻找显示空间模式的基因感兴趣。 有几种方法旨在明确解决这个问题,基于点过程或高斯过程回归框架:

  • SPARK
  • Spatial DE
  • trendsceek
  • HMRF

这里提供了一种基于著名的 Moran's I 统计量的简单方法,该方法实际上也用作上面列出的空间变量基因论文中的基线方法。 Squidpy 中的函数称为 squidpy.gr.spatial_autocorr(),并在 anndata.AnnData.var 槽中返回测试统计量和调整后的 p 值。 由于时间原因,我们将仅评估高度可变基因的子集。

genes = adata[:, adata.var.highly_variable].var_names.values[:1000]
sq.gr.spatial_autocorr(
    adata,
    mode="moran",
    genes=genes,
    n_perms=100,
    n_jobs=1,
)

The results are saved in adata.uns['moranI'] slot. Genes have already been sorted by Moran’s I statistic.

图片.png
sc.pl.spatial(adata, color=["Olfm1", "Plp1", "Itpka", "cluster"])
图片.png

接下来要分步解读了

Compute centrality scores

This example shows how to compute centrality scores, given a spatial graph and cell type annotation.

The scores calculated are closeness centrality, degree centrality and clustering coefficient with the following properties:

  • closeness centrality - measure of how close the group is to other nodes.
  • clustering coefficient - measure of the degree to which nodes cluster together.
  • degree centrality - fraction of non-group members connected to group members.

All scores are descriptive statistics of the spatial graph.

import squidpy as sq

adata = sq.datasets.imc()
adata

This dataset contains cell type annotations in anndata.AnnData.obs, which are used for calculation of centrality scores. First, we need to compute a connectivity matrix from spatial coordinates. We can use squidpy.gr.spatial_neighbors() for this purpose.

sq.gr.spatial_neighbors(adata)
### Centrality scores are calculated with [`squidpy.gr.centrality_scores()`]
sq.gr.centrality_scores(adata, "cell type")
###And visualize results with [`squidpy.pl.centrality_scores()`]
sq.pl.centrality_scores(adata, "cell type")
图片.png

Compute co-occurrence probability

除了邻居富集分数,还可以在空间维度上可视化cluster共现。 这是对上述分析的类似分析,但它不是对连接矩阵进行操作,而是对原始空间坐标进行操作。 共现分数定义为:

图片.png

where p(exp/cond) is the conditional probability of observing a cluster exp conditioned on the presence of a cluster cond, whereas p(exp) is the probability of observing exp in the radius size of interest.该分数是通过增加组织中每个观察(即此处的点)周围的半径大小来计算的

import scanpy as sc
import squidpy as sq

adata = sq.datasets.imc()
adata

We can compute the co-occurrence score with squidpy.gr.co_occurrence(). Results can be visualized with squidpy.pl.co_occurrence().

sq.gr.co_occurrence(adata, cluster_key="cell type")
sq.pl.co_occurrence(adata, cluster_key="cell type", clusters="basal CK tumor cell")
图片.png

We can further visualize tissue organization in spatial coordinates with scanpy.pl.spatial().

图片.png

Compute interaction matrix

This example shows how to compute the interaction matrix.

The interaction matrix quantifies the number of edges that nodes belonging to a given annotation shares with the other annotations. It’s a descriptive statistics of the spatial graph.

import squidpy as sq

adata = sq.datasets.imc()
adata
First, we need to compute a connectivity matrix from spatial coordinates. We can use squidpy.gr.spatial_neighbors() for this purpose.
sq.gr.spatial_neighbors(adata)

We can compute the interaction matrix with squidpy.gr.interaction_matrix(). Specify normalized = True if you want a row-normalized matrix. Results can be visualized with squidpy.pl.interaction_matrix().

sq.gr.interaction_matrix(adata, cluster_key="cell type")
sq.pl.interaction_matrix(adata, cluster_key="cell type")
图片.png

Receptor-ligand analysis

import squidpy as sq

adata = sq.datasets.seqfish()
adata

To get started, we just need an anndata.AnnData object with some clustering information. Below are some useful parameters of squidpy.gr.ligrec():

  • n_perms - number of permutations for the permutation test.
  • interactions - list of interaction, by default we fetch all available interactions from [Türei et al., 2016].
  • {interactions,transmitter,receiver}_params - parameters used if downloading the interactions, see omnipah.interactions.import_intercell_network() for more information.
  • threshold - percentage of cells required to be expressed in a given cluster.
  • corr_method - false discovery rate (FDR) correction method to use.

Since we’re interested in receptors and ligands in this example, we specify these categories in receiver_params and transmitter_params, respectively. If desired, we can also restrict the resources to just a select few. For example, in order to only use [Efremova et al., 2020], set interactions_params={'resources': 'CellPhoneDB'}.

res = sq.gr.ligrec(
    adata,
    n_perms=1000,
    cluster_key="celltype_mapped_refined",
    copy=True,
    use_raw=False,
    transmitter_params={"categories": "ligand"},
    receiver_params={"categories": "receptor"},
)
First, we inspect the calculated means. The resulting object is a pandas.DataFrame, with rows corresponding to interacting pairs and columns to cluster combinations.
res["means"].head()
图片.png
Next, we take a look at the p-values. If corr_method != None, this will contained the corrected p-values. The p-values marked as NaN correspond to interactions, which did not pass the filtering threshold specified above.
res["pvalues"].head()
图片.png

In order to plot the results, we can run squidpy.pl.ligrec(). Some useful parameters are:

  • {source,target}_groups - only plot specific source/target clusters.
  • dendrogram - whether to hierarchically cluster the rows, columns or both.
  • mean_range - plot only interactions whose means are in this range.
  • pval_threshold - plot only interactions whose p-values are below this threshold.

In the plot below, to highlight significance, we’ve marked all p-values <= 0.005 with tori.

sq.pl.ligrec(res, source_groups="Erythroid", alpha=0.005)
图片.png

Compute Moran’s I score

This example shows how to compute the Moran’s I global spatial auto-correlation statistics.

The Moran’s I global spatial auto-correlation statistics evaluates whether features (i.e. genes) shows a pattern that is clustered, dispersed or random in the tissue are under consideration.

import scanpy as sc
import squidpy as sq

adata = sq.datasets.visium_hne_adata()
genes = adata[:, adata.var.highly_variable].var_names.values[:100]
sq.gr.spatial_neighbors(adata)
sq.gr.moran(
    adata,
    genes=genes,
    n_perms=100,
    n_jobs=1,
)
adata.uns["moranI"].head(10)
图片.png
sc.pl.spatial(adata, color=["Resp18", "Tuba4a"])
图片.png

Neighbors enrichment analysis

This example shows how to run the neighbors enrichment analysis routine.

It calculates an enrichment score based on proximity on the connectivity graph of cell clusters. The number of observed events is compared against N
permutations and a z-score is computed.

import squidpy as sq
adata = sq.datasets.visium_fluo_adata()
####This dataset contains cell type annotations in anndata.Anndata.obs which are used for calculation of the neighborhood enrichment. First, we need to compute a connectivity matrix from spatial coordinates.
sq.gr.spatial_neighbors(adata)
####Then we can calculate the neighborhood enrichment score with [`squidpy.gr.nhood_enrichment()`](https://squidpy.readthedocs.io/en/latest/api/squidpy.gr.nhood_enrichment.html#squidpy.gr.nhood_enrichment "squidpy.gr.nhood_enrichment").
sq.gr.nhood_enrichment(adata, cluster_key="cluster")
sq.pl.nhood_enrichment(adata, cluster_key="cluster")
图片.png

Building spatial neighbors graph

This example shows how to compute a spatial neighbors graph.

Spatial graph is a graph of spatial neighbors with observations as nodes and neighbor-hood relations between observations as edges. We use spatial coordinates of spots/cells to identify neighbors among them. Different approach of defining a neighborhood relation among observations are used for different types of spatial datasets.

import scanpy as sc
import squidpy as sq

import numpy as np
First, we show how to compute the spatial neighbors graph for a Visium dataset.
adata = sq.datasets.visium_fluo_adata()
We use squidpy.gr.spatial_neighbors() for this. The function expects coord_type = 'visium' by default. We set this parameter here explicitly for clarity. n_rings should be used only for Visium datasets. It specifies for each spot how many hexagonal rings of spots around will be considered neighbors.
sq.gr.spatial_neighbors(adata, n_rings=2, coord_type="grid", n_neighs=6)
The function builds a spatial graph and saves its adjacency matrix to adata.obsp['spatial_connectivities'] and weighted adjacency matrix to adata.obsp['spatial_distances'] by default. Note that it can also build a a graph from a square grid, just set n_neighs = 4.
adata.obsp["spatial_connectivities"]
The weights of the weighted adjacency matrix are ordinal numbers of hexagonal rings in the case of coord_type = 'visium'.
adata.obsp["spatial_distances"]
We can visualize the neighbors of a point to better visualize what n_rings mean:
_, idx = adata.obsp["spatial_connectivities"][420, :].nonzero()
idx = np.append(idx, 420)
sc.pl.spatial(
    adata[idx, :],
    neighbors_key="spatial_neighbors",
    edges=True,
    edges_width=1,
    img_key=None,
)
图片.png
sq.gr.spatial_neighbors(adata, n_neighs=10, coord_type="generic")
_, idx = adata.obsp["spatial_connectivities"][420, :].nonzero()
idx = np.append(idx, 420)
sc.pl.spatial(
    adata[idx, :],
    color="cell type",
    neighbors_key="spatial_neighbors",
    spot_size=1,
    edges=True,
    edges_width=1,
    img_key=None,
)
图片.png

We use the same function for this with coord_type = 'generic' and delaunay = True. You can appreciate that the neighbor graph is slightly different than before.

sq.gr.spatial_neighbors(adata, delaunay=True, coord_type="generic")
_, idx = adata.obsp["spatial_connectivities"][420, :].nonzero()
idx = np.append(idx, 420)
sc.pl.spatial(
    adata[idx, :],
    color="cell type",
    neighbors_key="spatial_neighbors",
    spot_size=1,
    edges=True,
    edges_width=1,
    img_key=None,
)
图片.png
In order to get all spots within a specified radius (in units of the spatial coordinates) from each spot as neighbors, the parameter radius should be used.
sq.gr.spatial_neighbors(adata, radius=0.3, coord_type="generic")

adata.obsp["spatial_connectivities"]
adata.obsp["spatial_distances"]

生活很好,有你更好

©著作权归作者所有,转载或内容合作请联系作者
禁止转载,如需转载请通过简信或评论联系作者。
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,445评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,889评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,047评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,760评论 1 276
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,745评论 5 367
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,638评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,011评论 3 398
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,669评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,923评论 1 299
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,655评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,740评论 1 330
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,406评论 4 320
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,995评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,961评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,197评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,023评论 2 350
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,483评论 2 342

推荐阅读更多精彩内容