10X单细胞空间联合分析回顾之squidpy

新的一年，新的开始，既然选择了前方，只顾风雨兼程。

这一次，我们来继续回顾一下空间转录组的分析软件-------squidpy，当然了，我们主要关注10X空间转录组的分析内容。

Import packages & data

import scanpy as sc
import anndata as ad
import squidpy as sq

import numpy as np
import pandas as pd

sc.logging.print_header()
print(f"squidpy=={sq.__version__}")

# load the pre-processed dataset
img = sq.datasets.visium_hne_image()
adata = sq.datasets.visium_hne_adata()

First, let’s visualize cluster annotation in spatial context with [`scanpy.pl.spatial()`].(看来空间转录组是提前分析过的,我们要关注一下这里的空间注释)

sc.pl.spatial(adata, color="cluster")

图片.png

Image features

Visium 数据集包含用于基因提取的组织的高分辨率图像。使用函数 squidpy.im.calculate_image_features() 可以计算每个 Visium 点的图像特征并在 adata 中创建 obs x features矩阵，然后可以与 obs x gene基因表达矩阵一起分析。

通过提取图像特征，我们的目标是获得与基因表达值相似和互补的信息。例如，在具有形态不同的两种不同细胞类型的组织的情况下存在类似信息。这样的细胞类型信息然后包含在基因表达值和组织图像特征中。

Squidpy 包含几个特征提取器和一个灵活的计算不同尺度和大小特征的管道。有几个关于如何使用 squidpy.im.calculate_image_features() 的详细示例。提取图像特征为学习更多信息提供了一个很好的起点。 (这个我们放在后面)

# calculate features for different scales (higher value means more context)
for scale in [1.0, 2.0]:
    feature_name = f"features_summary_scale{scale}"
    sq.im.calculate_image_features(
        adata,
        img.compute(),
        features="summary",
        key_added=feature_name,
        n_jobs=4,
        scale=scale,
    )


# combine features in one dataframe
adata.obsm["features"] = pd.concat(
    [adata.obsm[f] for f in adata.obsm.keys() if "features_summary" in f], axis="columns"
)
# make sure that we have no duplicated feature names in the combined table
adata.obsm["features"].columns = ad.utils.make_index_unique(adata.obsm["features"].columns)

可以使用提取的图像特征来计算新的cluster注释。 这可能有助于根据图像形态深入了解点之间的相似性。

# helper function returning a clustering
def cluster_features(features: pd.DataFrame, like=None) -> pd.Series:
    """
    Calculate leiden clustering of features.

    Specify filter of features using `like`.
    """
    # filter features
    if like is not None:
        features = features.filter(like=like)
    # create temporary adata to calculate the clustering
    adata = ad.AnnData(features)
    # important - feature values are not scaled, so need to scale them before PCA
    sc.pp.scale(adata)
    # calculate leiden clustering
    sc.pp.pca(adata, n_comps=min(10, features.shape[1] - 1))
    sc.pp.neighbors(adata)
    sc.tl.leiden(adata)

    return adata.obs["leiden"]


# calculate feature clusters
adata.obs["features_cluster"] = cluster_features(adata.obsm["features"], like="summary")

# compare feature and gene clusters
sc.set_figure_params(facecolor="white", figsize=(8, 8))
sc.pl.spatial(adata, color=["features_cluster", "cluster"])

图片.png

比较基因和特征cluster，我们注意到在某些区域，它们看起来非常相似，比如cluster Fiber_tract，或者海马周围的cluster似乎被图像特征空间中的cluster粗略地概括了。在其他情况下，特征cluster看起来不同，比如在皮层中，基因cluster显示了皮层的分层结构，而特征cluster似乎显示了皮层的不同区域。

这只是对图像特征的简单比较分析，请注意，还可以使用图像特征来例如通过计算共享邻居图（例如在两个特征空间上的连接 PCA）来计算公共图像和基因聚类。

Spatial statistics and graph analysis

与其他空间数据类似，可以利用 Visium 数据中的空间和图形统计来研究空间组织。

Neighborhood enrichment(邻居富集)

计算邻域富集可以帮助我们识别在整个组织中共享公共邻域结构的spots clusters。可以使用以下函数计算这样的分数：squidpy.gr.nhood_enrichment()。简而言之，它是聚类空间接近度的丰富度得分：如果属于两个不同聚类的点通常彼此靠近，那么它们将具有高得分，可以定义为被 enriched。另一方面，如果它们相距很远，因此很少是邻域，则分数将很低，可以将它们定义为depleted。此分数基于置换检验，您可以使用 n_perms 参数（默认为 1000）设置。

由于该函数适用于连接矩阵，因此我们也需要计算它。这可以通过 squidpy.gr.spatial_neighbors() 来完成。有关此功能如何工作的更多详细信息，请参阅构建空间邻居图（这个文章后面会详细分享）。

Finally, we’ll directly visualize the results with `squidpy.pl.nhood_enrichment()`.

sq.gr.spatial_neighbors(adata)
sq.gr.nhood_enrichment(adata, cluster_key="cluster")
sq.pl.nhood_enrichment(adata, cluster_key="cluster")

图片.png

Given the spatial organization of the mouse brain coronal section, not surprisingly we find high neighborhood enrichment the Hippocampus region: Pyramidal_layer_dentate_gyrus and Pyramidal_layer clusters seems to be often neighbors with the larger Hippocampus cluster.

Co-occurrence across spatial dimensions(共定位)

除了邻居富集分数，还可以在空间维度上可视化cluster共现。这是对上述分析的类似分析，但它不是对连接矩阵进行操作，而是对原始空间坐标进行操作。共现分数定义为：

图片.png

where $p(exp/cond)$ is the conditional probability of observing a cluster $exp$ conditioned on the presence of a cluster $cond$ , whereas $p(exp)$ is the probability of observing $exp$ in the radius size of interest.该分数是通过增加组织中每个观察（即此处的点）周围的半径大小来计算的。

We are gonna compute such score with `squidpy.gr.co_occurrence()` and set the cluster annotation for the conditional probability with the argument `clusters`. Then, we visualize the results with `squidpy.pl.co_occurrence()`

sq.gr.co_occurrence(adata, cluster_key="cluster")
sq.pl.co_occurrence(
    adata,
    cluster_key="cluster",
    clusters="Hippocampus",
    figsize=(8, 4),
)

图片.png

The result largely recapitulates the previous analysis: the Pyramidal_layer cluster seem to co-occur at short distances with the larger Hippocampus cluster. It should be noted that the distance units are given in pixels of the Visium source_image, and corresponds to the same unit of the spatial coordinates saved in adata.obsm['spatial'].

Ligand-receptor interaction analysis

将继续分析显示与空间分子数据分析非常相关的几个特征级方法。例如，在成对cluster共现进行量化之后，可能对寻找可能驱动细胞通信的分子实例感兴趣。这自然转化为配体-受体相互作用分析。在 Squidpy 中，提供了一种快速重新实现的流行方法 CellPhoneDB，并使用流行的数据库扩展了其带注释的配体-受体相互作用对的数据库。可以使用 squidpy.gr.ligrec() 对所有cluster对和所有基因进行分析。此外，我们将直接可视化结果，过滤掉低表达的基因（使用 mean_range 参数）并增加调整后的 p 值的阈值（使用 alpha 参数）。

sq.gr.ligrec(
    adata,
    n_perms=100,
    cluster_key="cluster",
)
sq.pl.ligrec(
    adata,
    cluster_key="cluster",
    source_groups="Hippocampus",
    target_groups=["Pyramidal_layer", "Pyramidal_layer_dentate_gyrus"],
    means_range=(3, np.inf),
    alpha=1e-4,
    swap_axes=True,
)

图片.png

点图可视化提供了一组有趣的候选配体-受体注释，可能涉及海马中的细胞相互作用。例如，更精细的分析是将这些结果与去卷积方法的结果相结合，以了解该组织区域中存在的单细胞类型的比例是多少。

Spatially variable genes with Moran’s I(空间高变基因)

最后，我们可能对寻找显示空间模式的基因感兴趣。有几种方法旨在明确解决这个问题，基于点过程或高斯过程回归框架：

SPARK
Spatial DE
trendsceek
HMRF

这里提供了一种基于著名的 Moran's I 统计量的简单方法，该方法实际上也用作上面列出的空间变量基因论文中的基线方法。 Squidpy 中的函数称为 squidpy.gr.spatial_autocorr()，并在 anndata.AnnData.var 槽中返回测试统计量和调整后的 p 值。由于时间原因，我们将仅评估高度可变基因的子集。

genes = adata[:, adata.var.highly_variable].var_names.values[:1000]
sq.gr.spatial_autocorr(
    adata,
    mode="moran",
    genes=genes,
    n_perms=100,
    n_jobs=1,
)

The results are saved in adata.uns['moranI'] slot. Genes have already been sorted by Moran’s I statistic.

图片.png

sc.pl.spatial(adata, color=["Olfm1", "Plp1", "Itpka", "cluster"])

图片.png

接下来要分步解读了

Compute centrality scores

This example shows how to compute centrality scores, given a spatial graph and cell type annotation.

The scores calculated are closeness centrality, degree centrality and clustering coefficient with the following properties:

closeness centrality - measure of how close the group is to other nodes.
clustering coefficient - measure of the degree to which nodes cluster together.
degree centrality - fraction of non-group members connected to group members.

All scores are descriptive statistics of the spatial graph.

import squidpy as sq

adata = sq.datasets.imc()
adata

This dataset contains cell type annotations in anndata.AnnData.obs, which are used for calculation of centrality scores. First, we need to compute a connectivity matrix from spatial coordinates. We can use squidpy.gr.spatial_neighbors() for this purpose.

sq.gr.spatial_neighbors(adata)
### Centrality scores are calculated with [`squidpy.gr.centrality_scores()`]
sq.gr.centrality_scores(adata, "cell type")
###And visualize results with [`squidpy.pl.centrality_scores()`]
sq.pl.centrality_scores(adata, "cell type")

图片.png

Compute co-occurrence probability

图片.png

import scanpy as sc
import squidpy as sq

adata = sq.datasets.imc()
adata

We can compute the co-occurrence score with squidpy.gr.co_occurrence(). Results can be visualized with squidpy.pl.co_occurrence().

sq.gr.co_occurrence(adata, cluster_key="cell type")
sq.pl.co_occurrence(adata, cluster_key="cell type", clusters="basal CK tumor cell")

图片.png

We can further visualize tissue organization in spatial coordinates with `scanpy.pl.spatial()`.

图片.png

Compute interaction matrix

This example shows how to compute the interaction matrix.

The interaction matrix quantifies the number of edges that nodes belonging to a given annotation shares with the other annotations. It’s a descriptive statistics of the spatial graph.

import squidpy as sq

adata = sq.datasets.imc()
adata

First, we need to compute a connectivity matrix from spatial coordinates. We can use `squidpy.gr.spatial_neighbors()` for this purpose.

sq.gr.spatial_neighbors(adata)

We can compute the interaction matrix with squidpy.gr.interaction_matrix(). Specify normalized = True if you want a row-normalized matrix. Results can be visualized with squidpy.pl.interaction_matrix().

sq.gr.interaction_matrix(adata, cluster_key="cell type")
sq.pl.interaction_matrix(adata, cluster_key="cell type")

图片.png

Receptor-ligand analysis

import squidpy as sq

adata = sq.datasets.seqfish()
adata

To get started, we just need an anndata.AnnData object with some clustering information. Below are some useful parameters of squidpy.gr.ligrec():

n_perms - number of permutations for the permutation test.

interactions - list of interaction, by default we fetch all available interactions from [Türei et al., 2016].

{interactions,transmitter,receiver}_params - parameters used if downloading the interactions, see omnipah.interactions.import_intercell_network() for more information.

threshold - percentage of cells required to be expressed in a given cluster.

corr_method - false discovery rate (FDR) correction method to use.

Since we’re interested in receptors and ligands in this example, we specify these categories in receiver_params and transmitter_params, respectively. If desired, we can also restrict the resources to just a select few. For example, in order to only use [Efremova et al., 2020], set interactions_params={'resources': 'CellPhoneDB'}.

res = sq.gr.ligrec(
    adata,
    n_perms=1000,
    cluster_key="celltype_mapped_refined",
    copy=True,
    use_raw=False,
    transmitter_params={"categories": "ligand"},
    receiver_params={"categories": "receptor"},
)

First, we inspect the calculated means. The resulting object is a `pandas.DataFrame`, with rows corresponding to interacting pairs and columns to cluster combinations.

res["means"].head()

图片.png

Next, we take a look at the p-values. If corr_method != None, this will contained the corrected p-values. The p-values marked as NaN correspond to interactions, which did not pass the filtering threshold specified above.

res["pvalues"].head()

图片.png

In order to plot the results, we can run squidpy.pl.ligrec(). Some useful parameters are:

{source,target}_groups - only plot specific source/target clusters.

dendrogram - whether to hierarchically cluster the rows, columns or both.

mean_range - plot only interactions whose means are in this range.

pval_threshold - plot only interactions whose p-values are below this threshold.

In the plot below, to highlight significance, we’ve marked all p-values <= 0.005 with tori.

sq.pl.ligrec(res, source_groups="Erythroid", alpha=0.005)

图片.png

Compute Moran’s I score

This example shows how to compute the Moran’s I global spatial auto-correlation statistics.

The Moran’s I global spatial auto-correlation statistics evaluates whether features (i.e. genes) shows a pattern that is clustered, dispersed or random in the tissue are under consideration.

import scanpy as sc
import squidpy as sq

adata = sq.datasets.visium_hne_adata()
genes = adata[:, adata.var.highly_variable].var_names.values[:100]
sq.gr.spatial_neighbors(adata)
sq.gr.moran(
    adata,
    genes=genes,
    n_perms=100,
    n_jobs=1,
)
adata.uns["moranI"].head(10)

图片.png

sc.pl.spatial(adata, color=["Resp18", "Tuba4a"])

图片.png

Neighbors enrichment analysis

This example shows how to run the neighbors enrichment analysis routine.

It calculates an enrichment score based on proximity on the connectivity graph of cell clusters. The number of observed events is compared against $N$
permutations and a z-score is computed.

import squidpy as sq
adata = sq.datasets.visium_fluo_adata()
####This dataset contains cell type annotations in anndata.Anndata.obs which are used for calculation of the neighborhood enrichment. First, we need to compute a connectivity matrix from spatial coordinates.
sq.gr.spatial_neighbors(adata)
####Then we can calculate the neighborhood enrichment score with [`squidpy.gr.nhood_enrichment()`](https://squidpy.readthedocs.io/en/latest/api/squidpy.gr.nhood_enrichment.html#squidpy.gr.nhood_enrichment "squidpy.gr.nhood_enrichment").
sq.gr.nhood_enrichment(adata, cluster_key="cluster")
sq.pl.nhood_enrichment(adata, cluster_key="cluster")

图片.png

Building spatial neighbors graph

This example shows how to compute a spatial neighbors graph.

Spatial graph is a graph of spatial neighbors with observations as nodes and neighbor-hood relations between observations as edges. We use spatial coordinates of spots/cells to identify neighbors among them. Different approach of defining a neighborhood relation among observations are used for different types of spatial datasets.

import scanpy as sc
import squidpy as sq

import numpy as np

First, we show how to compute the spatial neighbors graph for a Visium dataset.

adata = sq.datasets.visium_fluo_adata()

We use `squidpy.gr.spatial_neighbors()` for this. The function expects `coord_type = 'visium'` by default. We set this parameter here explicitly for clarity. `n_rings` should be used only for Visium datasets. It specifies for each spot how many hexagonal rings of spots around will be considered neighbors.

sq.gr.spatial_neighbors(adata, n_rings=2, coord_type="grid", n_neighs=6)

The function builds a spatial graph and saves its adjacency matrix to adata.obsp['spatial_connectivities'] and weighted adjacency matrix to adata.obsp['spatial_distances'] by default. Note that it can also build a a graph from a square grid, just set n_neighs = 4.

adata.obsp["spatial_connectivities"]

The weights of the weighted adjacency matrix are ordinal numbers of hexagonal rings in the case of coord_type = 'visium'.

adata.obsp["spatial_distances"]

We can visualize the neighbors of a point to better visualize what n_rings mean:

_, idx = adata.obsp["spatial_connectivities"][420, :].nonzero()
idx = np.append(idx, 420)
sc.pl.spatial(
    adata[idx, :],
    neighbors_key="spatial_neighbors",
    edges=True,
    edges_width=1,
    img_key=None,
)

图片.png

sq.gr.spatial_neighbors(adata, n_neighs=10, coord_type="generic")
_, idx = adata.obsp["spatial_connectivities"][420, :].nonzero()
idx = np.append(idx, 420)
sc.pl.spatial(
    adata[idx, :],
    color="cell type",
    neighbors_key="spatial_neighbors",
    spot_size=1,
    edges=True,
    edges_width=1,
    img_key=None,
)

图片.png

We use the same function for this with coord_type = 'generic' and delaunay = True. You can appreciate that the neighbor graph is slightly different than before.

sq.gr.spatial_neighbors(adata, delaunay=True, coord_type="generic")
_, idx = adata.obsp["spatial_connectivities"][420, :].nonzero()
idx = np.append(idx, 420)
sc.pl.spatial(
    adata[idx, :],
    color="cell type",
    neighbors_key="spatial_neighbors",
    spot_size=1,
    edges=True,
    edges_width=1,
    img_key=None,
)

图片.png

In order to get all spots within a specified radius (in units of the spatial coordinates) from each spot as neighbors, the parameter radius should be used.

sq.gr.spatial_neighbors(adata, radius=0.3, coord_type="generic")

adata.obsp["spatial_connectivities"]
adata.obsp["spatial_distances"]

生活很好，有你更好