CellRank

---我不生产代码,我只是个代码的搬运工。

今天来盘盘这个新工具,这个工具的一大优点就是在有或者没有splicing数据的时候,都可以用,效果如何, 还在用自己的数据评测中。

首先用带有splcing 数据的分析,先加载包,读入数据

import scvelo as scv
import scanpy as sc
import cellrank as cr
import numpy as np
adata = sc.read_h5ad('/home/Documents/integrated_20L_with_splicing.h5ad')

如果需要从seurat 转换,推荐使用sceasy,支持格式帮你较多,大部分的转换都能用,seurat转anndata肯定可以(反向转换暂时有些问题),个人感觉比SeuratDisk那个垃圾好。
https://github.com/cellgeni/sceasy

在R中处理seurat 转换

## Seurat to AnnData
sceasy::convertFormat(scRNA,
    from = "seurat", to = "anndata",
    outFile = "scRNA.h5ad",
    drop_single_values = FALSE
)

转换完毕后,可以在python里加入splcing 数据。

然后在python中加载,再加入splicing 数据

你也可以用numpy加载数据,但是使用pandas 加载速度更快。

adata = scv.read("scRNA.h5ad")
## add splicing info to the anndata object
path='/home/Downloads/star_output/'
spliced=pd.read_csv(path+'Velocyto/filtered/spliced.mtx', 
                    skiprows=2, delimiter=' ',dtype=float).values
shape = np.loadtxt(path+'Velocyto/filtered/spliced.mtx', skiprows=2, max_rows = 1 ,delimiter=' ')[0:2].astype(int)
adata.layers['spliced']=sparse.csr_matrix((spliced[:,2], (spliced[:,0]-1, spliced[:,1]-1)), shape = (shape)).tocsr().T
unspliced=pd.read_csv(path+'Velocyto/filtered/unspliced.mtx', 
                      skiprows=2, delimiter=' ',dtype=float).values
adata.layers['unspliced']=sparse.csr_matrix((unspliced[:,2], (unspliced[:,0]-1, unspliced[:,1]-1)), shape = (shape)).tocsr().T
ambiguous= pd.read_csv(path+'Velocyto/filtered/ambiguous.mtx', 
                       skiprows=2, delimiter=' ',dtype=float).values
adata.layers['ambiguous']=sparse.csr_matrix((ambiguous[:,2], (ambiguous[:,0]-1, ambiguous[:,1]-1)), shape = (shape)).tocsr().T

然后继续处理,如果seurat 没有进行过标准化,可以按照scanpy标准程序如下,如果已经标准化了,可以跳过

# optional preprocessing
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
sc.tl.pca(adata)
sc.pp.neighbors(adata)
scv.tl.umap(adata)
scv.tl.leiden(adata)

开始scVelo

scv.pp.filter_genes(adata, min_shared_counts=20)
scv.pp.filter_genes_dispersion(adata, n_top_genes=4000)

# KNN-imputation using scVelo's moments function
scv.pp.moments(adata, n_pcs=20, n_neighbors=50)
scv.tl.recover_dynamics(adata,n_jobs=18) 
scv.tl.velocity(adata,mode='dynamical') ## or mode='stochastic'
scv.tl.velocity_graph(adata,n_jobs=12)

查看一下结果

#velocity map  stremaplot
scv.pl.velocity_embedding_stream(adata, basis="umap", color="seurat_clusters", dpi=200)
##  phase portraits
top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index
scv.pl.scatter(adata, basis=top_genes[:15], color="seurat_clusters",ncols=3, frameon=False)

正式开始CellRank

##cellrank, takes a long long long time
cr.tl.terminal_states(adata, cluster_key="seurat_clusters", weight_connectivities=0.2)
## terminal states
cr.pl.terminal_states(adata)

## Identify initial states,takes a long long time
cr.tl.initial_states(adata, cluster_key="seurat_clusters")
cr.pl.initial_states(adata, discrete=True)
## Compute fate maps
cr.tl.lineages(adata)
cr.pl.lineages(adata, same_plot=False)

PAGA

scv.tl.recover_latent_time(adata, root_key="initial_states_probs", end_key="terminal_states_probs")

#use the inferred pseudotime to compute the directed PAGA.
scv.tl.paga(
    adata,
    groups="clusters",
    root_key="initial_states_probs",
    end_key="terminal_states_probs",
    use_time_prior="velocity_pseudotime",
)

作图

cr.pl.cluster_fates(
    adata,
    mode="paga_pie",
    cluster_key="clusters",
    basis="umap",
    legend_kwargs={"loc": "top right "},
    legend_loc="top left",
    node_size_scale=2,
    edge_width_scale=1,
    max_edge_width=2,
    title="directed PAGA",
)

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容