单细胞转录组数据分析|| scanpy教程：PAGA轨迹推断

单细胞转录组数据分析|| scanpy教程：预处理与聚类

说起轨迹推断，很多人的第一印象就是monocle的轨迹图，大概率是长这样子的：

如果说单细胞转录组数据分析中的分群是寻找细胞的离散属性，那么轨迹推断就是寻找细胞分化连续性的尝试。为什么细胞的分化既有离散性又有连续性呢？这是一个历史问题，细胞的分化当然是连续的，之所以用分群的方法来解释异质性，实在是一种无奈之举。每一个细胞都是独一无二的，没有一个细胞是孤岛，这是我们的口号，但是理想与现实总是不能统一。

monocle提供了一套具有启发意义的轨迹方法，一简单粗暴的方式试图弥补这理想与现实的大峡谷。在monocle的世界里轨迹与图谱是分离的，即图谱是tsne/umap的，轨迹是另一个降维空间。那么有没有一种降维技术能够再走一步呢？今天我们介绍的scanpy的PAGA(graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells)就是这方面的一个尝试:在保留细胞图谱的基础上完成细胞轨迹的推断：

它是如何现实的呢？其实是统一了聚类和轨迹推断的空间结构。

基于分区的图抽象(Partition-based graph abstraction )生成单个细胞的拓扑结构并保留映射。高维基因表达数据降维后计算邻域关系的相关距离度量来表示kNN图（pca和欧氏距离）。kNN图按所需的分辨率进行分群，其中分群表示连续细胞群。为此，我们通常使用Louvain算法，然而，分群也可以通过其他任何方式获得。PAGA图是通过将一个节点与每个亚群关联起来，并通过表示亚群之间连接性的统计度量的加权边连接每个节点得到。通过丢弃低权重的假边，PAGA图揭示了数据在选定分辨率下的去噪拓扑结构，并揭示了其连接和断开的区域。

好了，我们来看看scanpy中PAGA是如何实现的吧，好不好？

首先我们导入我们的数据：


import numpy as np
import pandas as pd
import scanpy as sc
import seaborn as sns 




sc.settings.verbosity = 3             # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.logging.print_versions()
sc.settings.set_figure_params(dpi=80)

scanpy==1.4.5.1 
anndata==0.7.1 
umap==0.3.10 
numpy==1.16.5 
scipy==1.3.1 
pandas==0.25.1 
scikit-learn==0.21.3 
statsmodels==0.10.1 
python-igraph==0.8.0

results_file = 'E:\learnscanpy\write\pbmc3k.h5ad' 
help(sc.read_10x_mtx)
adata = sc.read_10x_mtx(
    'E:/learnscanpy/data/filtered_feature_bc_matrix',  # the directory with the `.mtx` file
    var_names='gene_symbols',                  # use gene symbols for the variable names (variables-axis index)
    cache=True) 
adata.var_names_make_unique()  # this is unnecessary if using `var_names='gene_ids'` in `sc.read_10x_mtx`

adata
Out[282]: 
AnnData object with n_obs × n_vars = 5025 × 33694 
    var: 'gene_ids', 'feature_types'

这里我们使用zheng17的数据预处理方法，仅仅是因为简单，您也可以像单细胞转录组数据分析|| scanpy教程：预处理与聚类一样自己一步一步地执行预处理。

sc.pp.recipe_zheng17(adata)
sc.tl.pca(adata, svd_solver='arpack')
sc.pp.neighbors(adata, n_neighbors=4, n_pcs=20)
sc.tl.leiden(adata)

也许，你会问这到底是一种怎样的cell QC的过程，不难，来看看它的帮助文档啊，都i给你写好了呢？！

help(sc.pp.recipe_zheng17)

Help on function recipe_zheng17 in module scanpy.preprocessing._recipes:

recipe_zheng17(adata: anndata._core.anndata.AnnData, n_top_genes: int = 1000, log: bool = True, plot: bool = False, copy: bool = False) -> Union[anndata._core.anndata.AnnData, NoneType]
    Normalization and filtering as of [Zheng17]_.
    
    Reproduces the preprocessing of [Zheng17]_ – the Cell Ranger R Kit of 10x
    Genomics.
    
    Expects non-logarithmized data.
    If using logarithmized data, pass `log=False`.
    
    The recipe runs the following steps
    
    .. code:: python
    
        sc.pp.filter_genes(adata, min_counts=1)         # only consider genes with more than 1 count
        sc.pp.normalize_per_cell(                       # normalize with total UMI count per cell
             adata, key_n_counts='n_counts_all'
        )
        filter_result = sc.pp.filter_genes_dispersion(  # select highly-variable genes
            adata.X, flavor='cell_ranger', n_top_genes=n_top_genes, log=False
        )
        adata = adata[:, filter_result.gene_subset]     # subset the genes
        sc.pp.normalize_per_cell(adata)                 # renormalize after filtering
        if log: sc.pp.log1p(adata)                      # log transform: adata.X = log(adata.X + 1)
        sc.pp.scale(adata)                              # scale to unit variance and shift to zero mean
    
    
    Parameters
    ----------
    adata
        Annotated data matrix.
    n_top_genes
        Number of genes to keep.
    log
        Take logarithm.
    plot
        Show a plot of the gene dispersion vs. mean relation.
    copy
        Return a copy of `adata` instead of updating it.
    
    Returns
    -------
    Returns or updates `adata` depending on `copy`.

加个plot参数试试，好使！

sc.tl.draw_graph(adata)
sc.pl.draw_graph(adata, color='leiden', legend_loc='on data')

这里你不要问下没有执行降维，这个细胞图谱是在哪个空间里的呢？这就要问一下sc.tl.draw_graph了：

adata
Out[292]: 
AnnData object with n_obs × n_vars = 5025 × 1000 
    obs: 'n_counts_all', 'leiden'
    var: 'gene_ids', 'feature_types', 'n_counts'
    uns: 'log1p', 'pca', 'neighbors', 'leiden', 'draw_graph', 'leiden_colors'
    obsm: 'X_pca', 'X_draw_graph_fr'
    varm: 'PCs'

help(sc.tl.draw_graph)
Help on function draw_graph in module scanpy.tools._draw_graph:

draw_graph(adata: anndata._core.anndata.AnnData, layout: scanpy._compat.Literal_ = 'fa', init_pos: Union[str, bool, NoneType] = None, root: Union[int, NoneType] = None, random_state: Union[int, mtrand.RandomState, NoneType] = 0, n_jobs: Union[int, NoneType] = None, adjacency: Union[scipy.sparse.base.spmatrix, NoneType] = None, key_added_ext: Union[str, NoneType] = None, copy: bool = False, **kwds)
    Force-directed graph drawing [Islam11]_ [Jacomy14]_ [Chippada18]_.
    
    An alternative to tSNE that often preserves the topology of the data
    better. This requires to run :func:`~scanpy.pp.neighbors`, first.
    
    The default layout ('fa', `ForceAtlas2`) [Jacomy14]_ uses the package |fa2|_
    [Chippada18]_, which can be installed via `pip install fa2`.
    
    `Force-directed graph drawing`_ describes a class of long-established
    algorithms for visualizing graphs.
    It has been suggested for visualizing single-cell data by [Islam11]_.
    Many other layouts as implemented in igraph [Csardi06]_ are available.
    Similar approaches have been used by [Zunder15]_ or [Weinreb17]_.
    
    .. |fa2| replace:: `fa2`
    .. _fa2: https://github.com/bhargavchippada/forceatlas2
    .. _Force-directed graph drawing: https://en.wikipedia.org/wiki/Force-directed_graph_drawing
    
    Parameters
    ----------
    adata
        Annotated data matrix.
    layout
        'fa' (`ForceAtlas2`) or any valid `igraph layout
        <http://igraph.org/c/doc/igraph-Layout.html>`__. Of particular interest
        are 'fr' (Fruchterman Reingold), 'grid_fr' (Grid Fruchterman Reingold,
        faster than 'fr'), 'kk' (Kamadi Kawai', slower than 'fr'), 'lgl' (Large
        Graph, very fast), 'drl' (Distributed Recursive Layout, pretty fast) and
        'rt' (Reingold Tilford tree layout).
    root
        Root for tree layouts.
    random_state
        For layouts with random initialization like 'fr', change this to use
        different intial states for the optimization. If `None`, no seed is set.
    adjacency
        Sparse adjacency matrix of the graph, defaults to
        `adata.uns['neighbors']['connectivities']`.
    key_added_ext
        By default, append `layout`.
    proceed
        Continue computation, starting off with 'X_draw_graph_`layout`'.
    init_pos
        `'paga'`/`True`, `None`/`False`, or any valid 2d-`.obsm` key.
        Use precomputed coordinates for initialization.
        If `False`/`None` (the default), initialize randomly.
    copy
        Return a copy instead of writing to adata.
    **kwds
        Parameters of chosen igraph layout. See e.g. `fruchterman-reingold`_
        [Fruchterman91]_. One of the most important ones is `maxiter`.
    
        .. _fruchterman-reingold: http://igraph.org/python/doc/igraph.Graph-class.html#layout_fruchterman_reingold
    
    Returns
    -------
    Depending on `copy`, returns or updates `adata` with the following field.
    
    **X_draw_graph_layout** : `adata.obsm`
        Coordinates of graph layout. E.g. for layout='fa' (the default),
        the field is called 'X_draw_graph_fa'

这是tsne的一种代替方案，由fa2库完成，绘图用的是Force-directed_graph_drawing，说明文档中给出了相关的链接。

那么我们看看umap的结构是怎么样的：

sc.tl.umap(adata)
sc.pl.umap(adata, color=['leiden'])

umap都做了，再看看tsne的结果吧：

sc.tl.tsne(adata)
sc.pl.tsne(adata, color=['leiden'])

果然tsne要慢很多，等了好一会。。。

可选操作：图形去噪

为了去噪，用扩散映射空间来表示它(而不是PCA空间)。计算几个扩散分量内的距离相当于图像去噪——我们只取几个第一个光谱分量。这与使用PCA去噪数据矩阵非常相似。注意，对于PAGA、聚类或伪时间估计，这都不是必要的步骤。

sc.tl.diffmap(adata)
computing Diffusion Maps using n_comps=15(=n_dcs)
computing transitions
    finished (0:00:00)
    eigenvalues of transition matrix
    [1.         1.         1.         0.9997476  0.9994019  0.9990253
     0.9986317  0.99379796 0.9936346  0.9925993  0.99070626 0.9898019
     0.98812026 0.9873447  0.9869766 ]
    finished: added
    'X_diffmap', diffmap coordinates (adata.obsm)
    'diffmap_evals', eigenvalues of transition matrix (adata.uns) (0:00:00)

sc.pp.neighbors(adata, n_neighbors=10, use_rep='X_diffmap')
sc.tl.draw_graph(adata)
sc.pl.draw_graph(adata, color='leiden', legend_loc='on data')

虽然还是有些凌乱，但还是清爽了一些的：

Clustering and PAGA

其实我们已经聚类过了啊，不然哪来的标签呢。

# sc.tl.louvain(adata, resolution=1.0)
# sc.tl.leiden(adata)
sc.tl.paga(adata, groups='leiden')
running PAGA
    finished: added
    'paga/connectivities', connectivities adjacency (adata.uns)
    'paga/connectivities_tree', connectivities subtree (adata.uns) (0:00:00)

adata.var_names
Out[304]: 
Index(['AP006222.2', 'MEGF6', 'RP11-46F15.2', 'GPR153', 'RP5-1113E3.3',
       'RP4-734G22.3', 'MASP2', 'RP3-467K16.2', 'RP4-680D5.2', 'AKR7A3',
       ...
       'MT-CO1', 'MT-CO2', 'MT-ATP6', 'MT-CO3', 'MT-ND3', 'MT-ND4', 'MT-ND5',
       'MT-CYB', 'AC145212.2', 'AC011043.1'],
      dtype='object', length=1000)

adata
Out[308]: 
AnnData object with n_obs × n_vars = 5025 × 1000 
    obs: 'n_counts_all', 'leiden'
    var: 'gene_ids', 'feature_types', 'n_counts'
    uns: 'log1p', 'pca', 'neighbors', 'leiden', 'draw_graph', 'leiden_colors', 'umap', 'diffmap_evals', 'paga', 'leiden_sizes'
    obsm: 'X_pca', 'X_draw_graph_fr', 'X_umap', 'X_tsne', 'X_diffmap'
    varm: 'PCs'

sc.pl.paga(adata, color=['leiden', 'MEGF6', 'MT-CO2', 'AKR7A3'])
--> added 'pos', the PAGA positions (adata.uns['paga'])

我们看到一个框架图，其实是一个无向的网络图，点的颜色代表不同分群，点的大小代表该群细胞数的大小，连线只有一种颜色，粗细代表两个群之间连接更密切。这里并没有一个拟时的概念，轨迹是一种相互关系，可以是时间的，也可以不是，而往往以人类的直觉，时间都是单向的，但是每个细胞都有自己的分化方向。一个简单点的框架图，让我们从新思考细胞分化这一基本事实！但是下面我们会在这个基础上推断出一个拟时结构。

adata.obs['leiden'].cat.categories
Out[306]: 
Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12',
       '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24',
       '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35'],
      dtype='object')

遇到核心函数，就算没有什么疑问也是要看说明文档的啊：

help(sc.tl.paga)
Help on function paga in module scanpy.tools._paga:

paga(adata: anndata._core.anndata.AnnData, groups: Union[str, NoneType] = None, use_rna_velocity: bool = False, model: scanpy._compat.Literal_ = 'v1.2', copy: bool = False)
    Mapping out the coarse-grained connectivity structures of complex manifolds [Wolf19]_.
    
    By quantifying the connectivity of partitions (groups, clusters) of the
    single-cell graph, partition-based graph abstraction (PAGA) generates a much
    simpler abstracted graph (*PAGA graph*) of partitions, in which edge weights
    represent confidence in the presence of connections. By tresholding this
    confidence in :func:`~scanpy.pl.paga`, a much simpler representation of the
    manifold data is obtained, which is nonetheless faithful to the topology of
    the manifold.
    
    The confidence should be interpreted as the ratio of the actual versus the
    expected value of connetions under the null model of randomly connecting
    partitions. We do not provide a p-value as this null model does not
    precisely capture what one would consider "connected" in real data, hence it
    strongly overestimates the expected value. See an extensive discussion of
    this in [Wolf19]_.
    
    .. note::
        Note that you can use the result of :func:`~scanpy.pl.paga` in
        :func:`~scanpy.tl.umap` and :func:`~scanpy.tl.draw_graph` via
        `init_pos='paga'` to get single-cell embeddings that are typically more
        faithful to the global topology.
    
    Parameters
    ----------
    adata
        An annotated data matrix.
    groups
        Key for categorical in `adata.obs`. You can pass your predefined groups
        by choosing any categorical annotation of observations. Default:
        The first present key of `'leiden'` or `'louvain'`.
    use_rna_velocity
        Use RNA velocity to orient edges in the abstracted graph and estimate
        transitions. Requires that `adata.uns` contains a directed single-cell
        graph with key `['velocity_graph']`. This feature might be subject
        to change in the future.
    model
        The PAGA connectivity model.
    copy
        Copy `adata` before computation and return a copy. Otherwise, perform
        computation inplace and return `None`.
    
    Returns
    -------
    **connectivities** : :class:`numpy.ndarray` (adata.uns['connectivities'])
        The full adjacency matrix of the abstracted graph, weights correspond to
        confidence in the connectivities of partitions.
    **connectivities_tree** : :class:`scipy.sparse.csr_matrix` (adata.uns['connectivities_tree'])
        The adjacency matrix of the tree-like subgraph that best explains
        the topology.
    
    Notes
    -----
    Together with a random walk-based distance measure
    (e.g. :func:`scanpy.tl.dpt`) this generates a partial coordinatization of
    data useful for exploring and explaining its variation.
    
    .. currentmodule:: scanpy
    
    See Also
    --------
    pl.paga
    pl.paga_path
    pl.paga_compare

我们看到use_rna_velocity参数，scanpy也可以用rna速率数据啊。

假装我们已经做好了细胞定义：

adata.obs['leiden_anno'] = adata.obs['leiden']
adata.obs['leiden_anno'].cat.categories = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10/Ery', '11', '12',
       '13', '14', '15', '16/Stem', '17', '18', '19/Neu', '20/Mk', '21', '22/Baso', '23', '24/Mo','25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35']
sc.tl.paga(adata, groups='leiden_anno')

sc.pl.paga(adata, threshold=0.03, show=False)

用PAGA的结果重新计算嵌入，也就是使得细胞的结构与PAGA的结构一致：

sc.tl.draw_graph(adata, init_pos='paga')
sc.pl.draw_graph(adata, color=['leiden_anno','MEGF6', 'MT-CO2', 'AKR7A3'], legend_loc='on data')

drawing single-cell graph using layout 'fa'
WARNING: Package 'fa2' is not installed, falling back to layout 'fr'.To use the faster and better ForceAtlas2 layout, install package 'fa2' (`pip install fa2`).

    finished: added
    'X_draw_graph_fr', graph_drawing coordinates (adata.obsm) (0:00:18)

当然这个图并不是我们期望，看上去依然略显混乱，根据waring的提示也许我应该pip install fa2,于是我们就pip install fa2,再来看一下结果：

forceatlas2将Gephi的Force Atlas 2布局算法移植到python2和python3(带有NetworkX和igraph的包装)。这是最快的python实现，大部分功能已经完成。它还支持Barnes Hut的最大加速近似值。ForceAtlas2是一种非常快速的面向力的图形布局算法。它用于在2D中对一个加权的无向图进行空间化(边缘权值定义了连接的强度)。实现基于本文和相应的gephi-java代码。与networkx的fruchterman reingold算法(spring layout)相比，它确实非常快，并且可以很好地扩展到大量节点(>10000)。

回到我们的help(sc.tl.draw_graph)，看到这layout的选项支持不同的igraph layout主题：

layout
        'fa' (`ForceAtlas2`) or any valid `igraph layout
        <http://igraph.org/c/doc/igraph-Layout.html>`__. Of particular interest
        are 'fr' (Fruchterman Reingold), 'grid_fr' (Grid Fruchterman Reingold,
        faster than 'fr'), 'kk' (Kamadi Kawai', slower than 'fr'), 'lgl' (Large
        Graph, very fast), 'drl' (Distributed Recursive Layout, pretty fast) and
        'rt' (Reingold Tilford tree layout).

sc.tl.draw_graph(adata, init_pos='paga',layout= 'lgl')
sc.pl.draw_graph(adata, color=['leiden_anno','MEGF6', 'MT-CO2', 'AKR7A3'], legend_loc='on data')

sc.tl.draw_graph(adata, init_pos='paga',layout= 'kk')
sc.pl.draw_graph(adata, color=['leiden_anno','MEGF6', 'MT-CO2', 'AKR7A3'], legend_loc='on data')

我们还是用fa吧。我们不禁会想，monocle的layout是什么呢，为什么看起来那么直白？

pl.figure(figsize=(8, 2))
for i in range(28):
    pl.scatter(i, 1, c=sc.pl.palettes.zeileis_28[i], s=200)
pl.show()

zeileis_colors = np.array(sc.pl.palettes.zeileis_28)
new_colors = np.array(adata.uns['leiden_anno_colors'])

new_colors[[16]] = zeileis_colors[[12]]  # Stem colors / green
new_colors[[10, 17, 5, 3, 15, 6, 18, 13, 7, 12]] = zeileis_colors[[5, 5, 5, 5, 11, 11, 10, 9, 21, 21]]  # Ery colors / red
new_colors[[20, 8]] = zeileis_colors[[17, 16]]  # Mk early Ery colors / yellow
new_colors[[4, 0]] = zeileis_colors[[2, 8]]  # lymph progenitors / grey
new_colors[[22]] = zeileis_colors[[18]]  # Baso / turquoise
new_colors[[19, 14, 2]] = zeileis_colors[[6, 6, 6]]  # Neu / light blue
new_colors[[24, 9, 1, 11]] = zeileis_colors[[0, 0, 0, 0]]  # Mo / dark blue
new_colors[[21, 23]] = zeileis_colors[[25, 25]]  # outliers / grey

adata.uns['leiden_anno_colors'] = new_colors
sc.pl.paga_compare(
    adata, threshold=0.03, title='', right_margin=0.2, size=10, edge_width_scale=0.5,
    legend_fontsize=12, fontsize=12, frameon=False, edges=True, save=True)

接下来我们在这个细胞图谱上绘制拟时信息，虽说是拟时推断，却需要指定一个亚群作为起点。但是真正的样本中有时并不存在一个明确的起点，有时是同时发生发育的。尽管每个生命从长远来看所有的细胞都来自一个细胞。这个起点如何确定，就像牛顿的宇宙第一推动者的问题一样，必须跳出数据本身来思考它。这里我们为了完成任务还是选择一个发育的起点吧。

adata.uns['iroot'] = np.flatnonzero(adata.obs['leiden_anno']  == '16/Stem')[0]
sc.tl.dpt(adata)

adata
Out[332]: 
AnnData object with n_obs × n_vars = 5025 × 1000 
    obs: 'n_counts_all', 'leiden', 'leiden_anno', 'dpt_pseudotime'
    var: 'gene_ids', 'feature_types', 'n_counts'
    uns: 'log1p', 'pca', 'neighbors', 'leiden', 'draw_graph', 'leiden_colors', 'umap', 'diffmap_evals', 'paga', 'leiden_sizes', 'leiden_anno_sizes', 'leiden_anno_colors', 'iroot'
    obsm: 'X_pca', 'X_draw_graph_fa', 'X_umap', 'X_tsne', 'X_diffmap', 'X_draw_graph_lgl', 'X_draw_graph_kk'
    varm: 'PCs'

adata.obs['dpt_pseudotime']
Out[333]: 
AAACCCAAGCGTATGG-1    0.661545
AAACCCAGTCCTACAA-1    0.670825
AAACCCATCACCTCAC-1    0.000000
AAACGCTAGGGCATGT-1    0.839952
AAACGCTGTAGGTACG-1    0.816287
  
TTTGTTGCAGGTACGA-1    0.826858
TTTGTTGCAGTCTCTC-1    0.792751
TTTGTTGGTAATTAGG-1    0.329366
TTTGTTGTCCTTGGAA-1    0.905214
TTTGTTGTCGCACGAC-1    0.771131
Name: dpt_pseudotime, Length: 5025, dtype: float32

我们不禁思考，既然每个生命从长远来看所有的细胞都来自一个细胞，是不是在一套数据集中可以设置一个遥远的点（实际本不存在的）作为发育的起点呢？这样是不是更能反映轨迹推断的实际呢？据我所知好像还没有这样的算法出现。

sc.tl.draw_graph(adata, init_pos='paga')
sc.pl.draw_graph(adata, color=['leiden_anno', 'dpt_pseudotime'], legend_loc='on data')

可以自定义发育路径

paths = [('erythrocytes', [16, 12, 7, 13, 18, 6, 5, 10]),
         ('neutrophils', [16, 0, 4, 2, 14, 19]),
         ('monocytes', [16, 0, 4, 11, 1, 9, 24])]
adata.obs['distance'] = adata.obs['dpt_pseudotime']
adata.obs['clusters'] = adata.obs['leiden_anno'] 
adata.uns['clusters_colors'] = adata.uns['leiden_anno_colors']
gene_names = adata.var_names[1:10]

adata
Out[338]: 
AnnData object with n_obs × n_vars = 5025 × 1000 
    obs: 'n_counts_all', 'leiden', 'leiden_anno', 'dpt_pseudotime', 'distance', 'clusters'
    var: 'gene_ids', 'feature_types', 'n_counts'
    uns: 'log1p', 'pca', 'neighbors', 'leiden', 'draw_graph', 'leiden_colors', 'umap', 'diffmap_evals', 'paga', 'leiden_sizes', 'leiden_anno_sizes', 'leiden_anno_colors', 'iroot', 'clusters_colors'
    obsm: 'X_pca', 'X_draw_graph_fa', 'X_umap', 'X_tsne', 'X_diffmap', 'X_draw_graph_lgl', 'X_draw_graph_kk'
    varm: 'PCs'

_, axs = pl.subplots(ncols=3, figsize=(6, 2.5), gridspec_kw={'wspace': 0.05, 'left': 0.12})
pl.subplots_adjust(left=0.05, right=0.98, top=0.82, bottom=0.2)
for ipath, (descr, path) in enumerate(paths):
    _, data = sc.pl.paga_path(
        adata, path, gene_names,
        show_node_names=False,
        ax=axs[ipath],
        ytick_fontsize=12,
        left_margin=0.15,
        n_avg=50,
        annotations=['distance'],
        show_yticks=True if ipath==0 else False,
        show_colorbar=False,
        color_map='Greys',
        groups_key='clusters',
        color_maps_annotations={'distance': 'viridis'},
        title='{} path'.format(descr),
        return_data=True,
        show=False)
    data.to_csv('./write/paga_path_{}.csv'.format(descr))
pl.savefig('./figures/paga_path_paul15.pdf')
pl.show()

仔细看看这个图有没有想起来monocle的，基因在不同命运中的变化情况：

paga

单细胞转录组数据分析|| scanpy教程：PAGA轨迹推断