Scanpy分析单细胞数据:轨迹分析PAGA


Scanpy数据结构:AnnData
Scanpy分析单细胞数据:预处理和聚类


PAGA全称Partition-based graph abstraction(基于分区的图抽象)。在一种类似于图聚类的方法中,PAGA生成数据的最近邻图,然后生成细胞的分组,连接细胞之间的连接比随机期望的更多的组,从而构建数据的摘要图。由于“短路”在细胞群之间的连接比在单个细胞之间的连接更容易识别,所以PAGA排除了细胞群之间的虚假连接。

1. 导入数据

使用预处理和聚类中保存的注释好的pbmc3k.h5ad文件来做拟时序分析。(原则上PBMC的数据不推荐用来做拟时序,这里仅做演示)

安装scanpy的时候需要使用pip install 'scanpy[louvain]',直接使用pip install scanpy后面跑sc.tl.paga(adata, groups='louvain')会报错。

import numpy as np
import pandas as pd
import matplotlib.pyplot as pl
from matplotlib import rcParams
import scanpy as sc
sc.settings.verbosity = 3  # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.logging.print_versions()
results_file = './pbmc3k_pseudo.h5ad'
sc.settings.set_figure_params(dpi=80, frameon=False, figsize=(3, 3), facecolor='white')  # low dpi (dots per inch) yields small inline figures
adata = sc.read_h5ad("pbmc3k.h5ad")
adata
adata.X = adata.X.astype('float64') 
2. 预处理和可视化
sc.tl.draw_graph(adata)
sc.pl.draw_graph(adata, color='leiden', legend_loc='on data',title = "")
#pl.savefig("draw_graph.pdf")
看起来比较混乱,但是这个降维图是怎么来的呢?

让我们来看看sc.tl.draw_graph()这个函数的帮助文档

help(sc.tl.draw_graph)
<pre style="caret-color: rgb(0, 0, 0); color: var(--jp-content-font-color1); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none; box-sizing: unset; font-family: var(--jp-code-font-family); font-size: var(--jp-code-font-size); line-height: normal; border: none; margin: 0px; padding: 0px; overflow: auto; word-break: break-all; overflow-wrap: break-word; white-space: pre-wrap; font-variant-ligatures: normal; orphans: 2; text-align: left; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;">Help on function draw_graph in module scanpy.tools._draw_graph:

draw_graph(adata: anndata._core.anndata.AnnData, layout: Literal['fr', 'drl', 'kk', 'grid_fr', 'lgl', 'rt', 'rt_circular', 'fa'] = 'fa', init_pos: Union[str, bool, NoneType] = None, root: Union[int, NoneType] = None, random_state: Union[NoneType, int, numpy.random.mtrand.RandomState] = 0, n_jobs: Union[int, NoneType] = None, adjacency: Union[scipy.sparse.base.spmatrix, NoneType] = None, key_added_ext: Union[str, NoneType] = None, neighbors_key: Union[str, NoneType] = None, obsp: Union[str, NoneType] = None, copy: bool = False, **kwds)
    Force-directed graph drawing [Islam11]_ [Jacomy14]_ [Chippada18]_.

    An alternative to tSNE that often preserves the topology of the data
    better. This requires to run :func:`~scanpy.pp.neighbors`, first.

    The default layout ('fa', `ForceAtlas2`) [Jacomy14]_ uses the package |fa2|_
    [Chippada18]_, which can be installed via `pip install fa2`.

    `Force-directed graph drawing`_ describes a class of long-established
    algorithms for visualizing graphs.
    It has been suggested for visualizing single-cell data by [Islam11]_.
    Many other layouts as implemented in igraph [Csardi06]_ are available.
    Similar approaches have been used by [Zunder15]_ or [Weinreb17]_.

    .. |fa2| replace:: `fa2`
    .. _fa2: [https://github.com/bhargavchippada/forceatlas2](https://github.com/bhargavchippada/forceatlas2)
    .. _Force-directed graph drawing: [https://en.wikipedia.org/wiki/Force-directed_graph_drawing](https://en.wikipedia.org/wiki/Force-directed_graph_drawing)

    Parameters
    ----------
    adata
        Annotated data matrix.
    layout
        'fa' (`ForceAtlas2`) or any valid `igraph layout
        <[http://igraph.org/c/doc/igraph-Layout.html>`__](http://igraph.org/c/doc/igraph-Layout.html%3E%60__). Of particular interest
        are 'fr' (Fruchterman Reingold), 'grid_fr' (Grid Fruchterman Reingold,
        faster than 'fr'), 'kk' (Kamadi Kawai', slower than 'fr'), 'lgl' (Large
        Graph, very fast), 'drl' (Distributed Recursive Layout, pretty fast) and
        'rt' (Reingold Tilford tree layout).
    root
        Root for tree layouts.
    random_state
        For layouts with random initialization like 'fr', change this to use
        different intial states for the optimization. If `None`, no seed is set.
    adjacency
        Sparse adjacency matrix of the graph, defaults to neighbors connectivities.
    key_added_ext
        By default, append `layout`.
    proceed
        Continue computation, starting off with 'X_draw_graph_`layout`'.
    init_pos
        `'paga'`/`True`, `None`/`False`, or any valid 2d-`.obsm` key.
        Use precomputed coordinates for initialization.
        If `False`/`None` (the default), initialize randomly.
    neighbors_key
        If not specified, draw_graph looks .obsp['connectivities'] for connectivities
        (default storage place for pp.neighbors).
        If specified, draw_graph looks
        .obsp[.uns[neighbors_key]['connectivities_key']] for connectivities.
    obsp
        Use .obsp[obsp] as adjacency. You can't specify both
        `obsp` and `neighbors_key` at the same time.
    copy
        Return a copy instead of writing to adata.
    **kwds
        Parameters of chosen igraph layout. See e.g. `fruchterman-reingold`_
        [Fruchterman91]_. One of the most important ones is `maxiter`.

        .. _fruchterman-reingold: [http://igraph.org/python/doc/igraph.Graph-class.html#layout_fruchterman_reingold](http://igraph.org/python/doc/igraph.Graph-class.html#layout_fruchterman_reingold)

    Returns
    -------
    Depending on `copy`, returns or updates `adata` with the following field.

    **X_draw_graph_layout** : `adata.obsm`
        Coordinates of graph layout. E.g. for layout='fa' (the default),
        the field is called 'X_draw_graph_fa'</pre>

这是tsne的一种代替方案,由fa2库完成,绘图用的是Force-directed_graph_drawing,说明文档中给出了相关的链接。

看一下umap的结果

sc.tl.umap(adata)
sc.pl.umap(adata, color=['leiden'])
3. Denoising the graph(可选步骤)
##去噪,扩散图空间表示
sc.tl.diffmap(adata)
sc.pp.neighbors(adata, n_neighbors=10, use_rep='X_diffmap')
sc.tl.draw_graph(adata)
sc.pl.draw_graph(adata, color='leiden', legend_loc='on data') #legend_loc='right margin'
#pl.savefig("draw_graph_diffmap.pdf")
额 还是有点乱
4. 聚类和PAGA
## sc.tl.louvain(adata, resolution=1.0) 前面已经做过这一步了,做了3再做4,重新聚类分的群的数目会和一开始不一样
sc.tl.paga(adata, groups='leiden')
sc.pl.paga(adata, color=['leiden', 'CD4', 'CD8A', 'CD14'])

这是一个无向的网络图,点的颜色代表不同分群,点的大小代表该群细胞数的大小,连线只有一种颜色,粗细代表两个群之间连接更密切。这里并没有一个拟时的概念,但是下面我们会在这个基础上推断出一个拟时结构。

adata.obs['leiden'].cat.categories
# Index(['0', '1', '2', '3', '4', '5', '6', '7'], dtype='object')
#       dtype='object')

我们根据已知marker基因识别细胞类别,可以将细胞类型的信息注释上去。当然我们前面已经注释好了,保存在adata.obs['leiden_anno']。

# adata.obs['leiden_anno'] = adata.obs['leiden']
# adata.obs['leiden_anno'].cat.categories = ['CD4 T', 'CD14 Monocytes','B', 'CD8 T','NK', 'FCGR3A Monocytes','Dendritic', 'Megakaryocytes']
sc.tl.paga(adata, groups='leiden_anno')
sc.pl.paga(adata, threshold=0.03, show=False)
#利用PAGA重新计算细胞之间的距离
sc.tl.draw_graph(adata, init_pos='paga')
sc.pl.draw_graph(adata, color=['leiden_anno','MS4A1', 'NKG7', 'PPBP'], legend_loc='on data')

查看颜色,可以自行定义颜色

pl.figure(figsize=(8, 2))
for i in range(28):
    pl.scatter(i, 1, c=sc.pl.palettes.zeileis_28[i], s=200)
pl.show()
zeileis_colors = np.array(sc.pl.palettes.zeileis_28)
new_colors = np.array(adata.uns['leiden_anno_colors'])
new_colors[[0]] = zeileis_colors[[12]]  # CD4 T colors / green
new_colors[[1]] = zeileis_colors[[5]]  # CD14 Monocytes colors / red
new_colors[[2]] = zeileis_colors[[17]]  # B colors / yellow
new_colors[[3]] = zeileis_colors[[2]]  # CD8 T / grey
new_colors[[4]] = zeileis_colors[[18]]  # NK / turquoise
new_colors[[5]] = zeileis_colors[[6]]  # FCGR3A Monocytes / light blue
new_colors[[6]] = zeileis_colors[[0]]  # Dendritic / dark blue
new_colors[[7]] = zeileis_colors[[25]]  # Megakaryocytes / grey
#new_colors[[10, 17, 5, 3, 15, 6, 18, 13, 7, 12]] = zeileis_colors[[5, 5, 5, 5, 11, 11, 10, 9, 21, 21]]  # CD14 Monocytes colors / red

adata.uns['leiden_anno_colors'] = new_colors
sc.pl.paga_compare(
    adata, threshold=0.03, title='', right_margin=0.2, size=10, edge_width_scale=0.5,
    legend_fontsize=12, fontsize=12, frameon=False, edges=True)
5. 针对给定的一组基因,沿PAGA路径重建基因变化

定义分化起点,计算每个细胞的拟时间,画拟时间分布(这儿我是随便取的B细胞作为root,请根据自身数据细胞类别选取)

adata.uns['iroot'] = np.flatnonzero(adata.obs['leiden_anno']  == 'B')[0] ##假设分化起点为B cells,当然自己分析的时候需要根据数据实际情况选择分化起点
sc.tl.dpt(adata)
sc.pl.draw_graph(adata, color=['leiden_anno', 'dpt_pseudotime'],  legend_loc='on data',title = ['','pseudotime'], frameon=True)
sc.pl.draw_graph(adata, color=['leiden', 'dpt_pseudotime'],  legend_loc='on data',title = ['','pseudotime'], frameon=True)

针对给定基因,沿PAGA路径重建基因变化

gene_names = ['IL7R', 'CD14', 'LYZ',  'MS4A1', 'CD8A', 'CD8B', 'FCGR3A', 'MS4A7','GNLY', 'NKG7', 'FCER1A', 'CST3']                     # 选取了一列marker 基因,要根据实际情况选取
data_raw = sc.read_10x_mtx('./data/filtered_gene_bc_matrices/hg19/', var_names='gene_symbols', cache=True)  
data_raw.var_names_make_unique()
sc.pp.filter_cells(data_raw, min_genes=200)   # 去除表达基因200以下的细胞
sc.pp.filter_genes(data_raw, min_cells=3)     # 去除在3个细胞以下表达的基因
mito_genes=data_raw.var_names.str.startswith('MT-')
data_raw.obs['percent_mito']=np.sum(data_raw[:,mito_genes].X,axis=1).A1/np.sum(data_raw.X,axis=1).A1
data_raw.obs['n_counts']=data_raw.X.sum(axis=1).A1
data_raw = data_raw[data_raw.obs.n_genes < 2500, :]
data_raw = data_raw[data_raw.obs.percent_mito < 0.05, :]
sc.pp.normalize_total(data_raw, target_sum=1e4)
sc.pp.log1p(data_raw)
sc.pp.scale(data_raw)
adata.raw = data_raw #使用完整的原始数据进行可视化,处理方法和预处理和聚类里的一样
paths = [('NK', [2,0,3]), ('FCGR3A Monocytes', [2,6,1])] #自定义轨迹,以B细胞为起点,目标细胞为分化轨迹终点
adata.obs['distance'] = adata.obs['dpt_pseudotime']
adata.obs['clusters'] = adata.obs['leiden']  # just a cosmetic change
adata.uns['clusters_colors'] = adata.uns['leiden_anno_colors']
_, axs = pl.subplots(ncols=2, figsize=(6, 2.5), gridspec_kw={'wspace': 0.05, 'left': 0.12})
pl.subplots_adjust(left=0.05, right=0.98, top=0.82, bottom=0.2)
for ipath, (descr, path) in enumerate(paths):
    _, data = sc.pl.paga_path(
        adata, path, gene_names,
        show_node_names=False,
        ax=axs[ipath],
        ytick_fontsize=12,
        left_margin=0.15,
        n_avg=50,
        annotations=['distance'],
        show_yticks=True if ipath==0 else False,
        show_colorbar=False,
        color_map='Greys',
        groups_key='clusters',
        color_maps_annotations={'distance': 'viridis'},
        title='{} path'.format(descr),
        return_data=True,
        show=False)
    data.to_csv('./paga_path_{}.csv'.format(descr))
pl.savefig('paga_path_pbmc.pdf')
pl.show()

参考:https://scanpy-tutorials.readthedocs.io/en/latest/paga-paul15.html#Recomputing-the-embedding-using-PAGA-initialization

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
禁止转载,如需转载请通过简信或评论联系作者。
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 222,252评论 6 516
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 94,886评论 3 399
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 168,814评论 0 361
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 59,869评论 1 299
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 68,888评论 6 398
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 52,475评论 1 312
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 41,010评论 3 422
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,924评论 0 277
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 46,469评论 1 319
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 38,552评论 3 342
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,680评论 1 353
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 36,362评论 5 351
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 42,037评论 3 335
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,519评论 0 25
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,621评论 1 274
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 49,099评论 3 378
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,691评论 2 361

推荐阅读更多精彩内容