单细胞转录组数据分析|| pypairs教程:细胞周期推断

单细胞转录组技术是来解释细胞状态的有力武器,在分群得到不同细胞类型的划分之前,有时候需要消除细胞周期的影响。毕竟同样的细胞类型也有着不同的状态,细胞周期就是状态之一。

在R中有scran以及seurat包均可推断以及回归掉细胞周期影响的函数。各方的引用几本来自这篇文献: Scialdone et al. (2015),感兴趣的同学可以学习一下。

在scran中已经附带了人和小鼠细胞周期相关的基因:


library(scater)

# Getting pre-trained marker sets
mm.pairs <- readRDS(system.file("exdata", "mouse_cycle_markers.rds", package="scran"))
head(mm.pairs$G1)
              first             second
1 ENSMUSG00000000001 ENSMUSG00000001785
2 ENSMUSG00000000001 ENSMUSG00000005470
3 ENSMUSG00000000001 ENSMUSG00000012443
4 ENSMUSG00000000001 ENSMUSG00000015120
5 ENSMUSG00000000001 ENSMUSG00000022033
6 ENSMUSG00000000001 ENSMUSG00000023015


hs.pairs <- readRDS(system.file("exdata", "human_cycle_markers.rds", package="scran"))
head(hs.pairs$G2M)

           first          second
1 ENSG00000100519 ENSG00000146457
2 ENSG00000100519 ENSG00000132646
3 ENSG00000100519 ENSG00000124171
4 ENSG00000123975 ENSG00000100519
5 ENSG00000126787 ENSG00000100519
6 ENSG00000161057 ENSG00000100519

推测细胞周期的状态(G1,S,G2M)是根据成对的基因表达变化来的。

而 seurat中是根据细胞周期基因来推断的:

 Seurat::cc.genes.updated.2019
$s.genes
 [1] "MCM5"     "PCNA"     "TYMS"     "FEN1"     "MCM7"     "MCM4"     "RRM1"     "UNG"      "GINS2"    "MCM6"     "CDCA7"   
[12] "DTL"      "PRIM1"    "UHRF1"    "CENPU"    "HELLS"    "RFC2"     "POLR1B"   "NASP"     "RAD51AP1" "GMNN"     "WDR76"   
[23] "SLBP"     "CCNE2"    "UBR7"     "POLD3"    "MSH2"     "ATAD2"    "RAD51"    "RRM2"     "CDC45"    "CDC6"     "EXO1"    
[34] "TIPIN"    "DSCC1"    "BLM"      "CASP8AP2" "USP1"     "CLSPN"    "POLA1"    "CHAF1B"   "MRPL36"   "E2F8"    

$g2m.genes
 [1] "HMGB2"   "CDK1"    "NUSAP1"  "UBE2C"   "BIRC5"   "TPX2"    "TOP2A"   "NDC80"   "CKS2"    "NUF2"    "CKS1B"   "MKI67"  
[13] "TMPO"    "CENPF"   "TACC3"   "PIMREG"  "SMC4"    "CCNB2"   "CKAP2L"  "CKAP2"   "AURKB"   "BUB1"    "KIF11"   "ANP32E" 
[25] "TUBB4B"  "GTSE1"   "KIF20B"  "HJURP"   "CDCA3"   "JPT1"    "CDC20"   "TTK"     "CDC25C"  "KIF2C"   "RANGAP1" "NCAPD2" 
[37] "DLGAP5"  "CDCA2"   "CDCA8"   "ECT2"    "KIF23"   "HMMR"    "AURKA"   "PSRC1"   "ANLN"    "LBR"     "CKAP5"   "CENPE"  
[49] "CTCF"    "NEK2"    "G2E3"    "GAS2L3"  "CBX5"    "CENPA"  

那,今天和大家介绍一种类似scran的在python中实现的方法:pypairs

PyPairs - A python scRNA-Seq classifier

# -*- coding: utf-8 -*-
"""
Created on Sat Mar 28 22:45:55 2020
# https://pypairs.readthedocs.io/en/latest/welcome.html
@author: Administrator
"""

from pypairs import pairs, datasets

# Load samples from the oscope scRNA-Seq dataset with known cell cycle
training_data = datasets.leng15(mode='sorted')
training_data.obs
Out[155]: 
            category
G2_Exp1.059      G2M
G2_Exp1.069      G2M
G2_Exp1.075      G2M
G2_Exp1.063      G2M
G2_Exp1.029      G2M
             ...
G1_Exp1.063       G1
G1_Exp1.083       G1
G1_Exp1.030       G1
G1_Exp1.018       G1
G1_Exp1.046       G1
 help(datasets.leng15)
Help on function leng15 in module pypairs.datasets:

leng15(mode: Union[str, NoneType] = 'all', gene_sub: Union[Iterable[int], NoneType] = None, sample_sub: Union[Iterable[int], NoneType] = None) -> anndata._core.anndata.AnnData
    Single cell RNA-seq data of human hESCs to evaluate Oscope [Leng15]_
    
    Total 213 H1 single cells and 247 H1-Fucci single cells were sequenced.
    The 213 H1 cells were used to evaluate Oscope in identifying oscillatory genes.
    The H1-Fucci cells were used to confirm the cell cycle gene cluster identified
    by Oscope in the H1 hESCs.
    Normalized expected counts are provided in GSE64016_H1andFUCCI_normalized_EC.csv.gz
    
    Reference
    ---------
        GEO-Dataset: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE64016
    
    Parameters
    ----------
    mode
        sample selection mode:
            - 'all' for all samples, default
            - 'sorted' for all samples with known cell cycle (G2, S or G1)
            - 'unsorted' for all samples with unknown cell cycle (H1)
    gene_sub
        Index based array of subsetted genes
    sample_sub
        Index based array of subsetted samples
    
    Returns
    -------
    adata : :class:`~anndata.AnnData`
        Annotated data matrix containing the normalized gene counts

获得细胞周期相关基因对,类似scran的sandbag:

# Run sandbag() to identify marker pairs
marker_pairs = pairs.sandbag(training_data, fraction=0.3)
marker_pairs['G2M'][1:10]
Out[166]: 
[('KAT5', 'SYNJ2BP-COX16'),
 ('USF1', 'KIAA0040'),
 ('CAMK2N1', 'ST3GAL6'),
 ('CAMK2N1', 'TMEM108'),
 ('CAMK2N1', 'PAK3'),
 ('CAMK2N1', 'ZNF789'),
 ('CAMK2N1', 'NBPF11'),
 ('CAMK2N1', 'RHOBTB1'),
 ('CAMK2N1', 'KCNG3')]

读入我们的数据

import scanpy as sc 
results_file = 'E:\learnscanpy\write\pbmc3k.h5ad'
sdata = sc.read_h5ad(results_file)
sdata

AnnData object with n_obs × n_vars = 2223 × 2208 
    obs: 'n_genes', 'percent_mito', 'n_counts', 'percent_mito1', 'leiden'
    var: 'gene_ids', 'feature_types', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'leiden', 'leiden_colors', 'leiden_sizes', 'neighbors', 'paga', 'pca', 'rank_genes_groups', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'

sdata.obs

                    n_genes  percent_mito  ...  percent_mito1          leiden
AAACGCTGTGTGATGG-1     2348      0.099821  ...     998.205811     NewCluster3
AAACGCTTCTAGGCAT-1     2470      0.065564  ...     655.640869       Dendritic
AAAGAACCACCGTCTT-1      420      0.021944  ...     219.435730              NK
AAAGAACTCACCTGGG-1     2479      0.074067  ...     740.672119               B
AAAGAACTCGGAACTT-1     2318      0.079713  ...     797.133240       Dendritic
                    ...           ...  ...            ...             ...
TTTGGTTTCCGGTAAT-1     2277      0.079324  ...     793.240723           CD8 T
TTTGTTGCAGGTACGA-1     1986      0.071226  ...     712.257751  CD14 Monocytes
TTTGTTGCAGTCTCTC-1     2485      0.055394  ...     553.935852           CD8 T
TTTGTTGTCCTTGGAA-1     1998      0.081037  ...     810.366882  Megakaryocytes
TTTGTTGTCGCACGAC-1     2468      0.070987  ...     709.868225           CD8 T

[2223 rows x 5 columns]

进行预测:

sresult  = pairs.cyclone(sdata, marker_pairs)
print(sresult)

                      G2M      S     G1 max_class cc_prediction
AAACGCTGTGTGATGG-1  0.032  0.986  0.701         S            G1
AAACGCTTCTAGGCAT-1  0.848  0.970  0.543         S           G2M
AAAGAACCACCGTCTT-1  0.320  0.467  0.855        G1            G1
AAAGAACTCACCTGGG-1  0.190  0.846  0.288         S             S
AAAGAACTCGGAACTT-1  0.358  0.723  0.871        G1            G1
                  ...    ...    ...       ...           ...
TTTGGTTTCCGGTAAT-1  0.451  0.689  0.392         S             S
TTTGTTGCAGGTACGA-1  0.680  0.410  0.684        G1            G1
TTTGTTGCAGTCTCTC-1  0.544  0.763  0.278         S           G2M
TTTGTTGTCCTTGGAA-1  0.033  0.924  0.936        G1            G1
TTTGTTGTCGCACGAC-1  0.908  0.748  0.745       G2M           G2M

[2223 rows x 5 columns]

在umap空间可视化一下:

sc.tl.umap(sdata)
 sc.pl.umap(sdata,color = ['pypairs_cc_prediction','leiden'])

在scanpy中消除每种细胞周期得分的影响:

sc.pp.regress_out(sdata, ['n_counts', 'percent_mito','pypairs_G2M','pypairs_G1','pypairs_G1'])
sc.pp.scale(sdata, max_value=10)
sc.tl.pca(sdata, svd_solver='arpack')
sc.pp.neighbors(sdata, n_neighbors=10, n_pcs=40)
sc.tl.leiden(sdata)
sc.tl.umap(sdata)
sc.pl.umap(sdata,color = ['pypairs_cc_prediction','leiden'])

PyPairs - A python scRNA-Seq classifier
This is a python-reimplementation of the Pairs algorithm as described by A. Scialdone et. al. (2015). Original Paper available under: <https://doi.org/10.1016/j.ymeth.2015.06.021>

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,919评论 6 502
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,567评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 163,316评论 0 353
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,294评论 1 292
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,318评论 6 390
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,245评论 1 299
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,120评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,964评论 0 275
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,376评论 1 313
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,592评论 2 333
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,764评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,460评论 5 344
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,070评论 3 327
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,697评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,846评论 1 269
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,819评论 2 370
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,665评论 2 354

推荐阅读更多精彩内容