---我不生产代码,我只是个代码的搬运工。
今天来盘盘这个新工具,这个工具的一大优点就是在有或者没有splicing数据的时候,都可以用,效果如何, 还在用自己的数据评测中。
首先用带有splcing 数据的分析,先加载包,读入数据
import scvelo as scv
import scanpy as sc
import cellrank as cr
import numpy as np
adata = sc.read_h5ad('/home/Documents/integrated_20L_with_splicing.h5ad')
如果需要从seurat 转换,推荐使用sceasy,支持格式帮你较多,大部分的转换都能用,seurat转anndata肯定可以(反向转换暂时有些问题),个人感觉比SeuratDisk那个垃圾好。
https://github.com/cellgeni/sceasy
在R中处理seurat 转换
## Seurat to AnnData
sceasy::convertFormat(scRNA,
from = "seurat", to = "anndata",
outFile = "scRNA.h5ad",
drop_single_values = FALSE
)
转换完毕后,可以在python里加入splcing 数据。
然后在python中加载,再加入splicing 数据
你也可以用numpy加载数据,但是使用pandas 加载速度更快。
adata = scv.read("scRNA.h5ad")
## add splicing info to the anndata object
path='/home/Downloads/star_output/'
spliced=pd.read_csv(path+'Velocyto/filtered/spliced.mtx',
skiprows=2, delimiter=' ',dtype=float).values
shape = np.loadtxt(path+'Velocyto/filtered/spliced.mtx', skiprows=2, max_rows = 1 ,delimiter=' ')[0:2].astype(int)
adata.layers['spliced']=sparse.csr_matrix((spliced[:,2], (spliced[:,0]-1, spliced[:,1]-1)), shape = (shape)).tocsr().T
unspliced=pd.read_csv(path+'Velocyto/filtered/unspliced.mtx',
skiprows=2, delimiter=' ',dtype=float).values
adata.layers['unspliced']=sparse.csr_matrix((unspliced[:,2], (unspliced[:,0]-1, unspliced[:,1]-1)), shape = (shape)).tocsr().T
ambiguous= pd.read_csv(path+'Velocyto/filtered/ambiguous.mtx',
skiprows=2, delimiter=' ',dtype=float).values
adata.layers['ambiguous']=sparse.csr_matrix((ambiguous[:,2], (ambiguous[:,0]-1, ambiguous[:,1]-1)), shape = (shape)).tocsr().T
然后继续处理,如果seurat 没有进行过标准化,可以按照scanpy标准程序如下,如果已经标准化了,可以跳过
# optional preprocessing
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
sc.tl.pca(adata)
sc.pp.neighbors(adata)
scv.tl.umap(adata)
scv.tl.leiden(adata)
开始scVelo
scv.pp.filter_genes(adata, min_shared_counts=20)
scv.pp.filter_genes_dispersion(adata, n_top_genes=4000)
# KNN-imputation using scVelo's moments function
scv.pp.moments(adata, n_pcs=20, n_neighbors=50)
scv.tl.recover_dynamics(adata,n_jobs=18)
scv.tl.velocity(adata,mode='dynamical') ## or mode='stochastic'
scv.tl.velocity_graph(adata,n_jobs=12)
查看一下结果
#velocity map stremaplot
scv.pl.velocity_embedding_stream(adata, basis="umap", color="seurat_clusters", dpi=200)
## phase portraits
top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index
scv.pl.scatter(adata, basis=top_genes[:15], color="seurat_clusters",ncols=3, frameon=False)
正式开始CellRank
##cellrank, takes a long long long time
cr.tl.terminal_states(adata, cluster_key="seurat_clusters", weight_connectivities=0.2)
## terminal states
cr.pl.terminal_states(adata)
## Identify initial states,takes a long long time
cr.tl.initial_states(adata, cluster_key="seurat_clusters")
cr.pl.initial_states(adata, discrete=True)
## Compute fate maps
cr.tl.lineages(adata)
cr.pl.lineages(adata, same_plot=False)
PAGA
scv.tl.recover_latent_time(adata, root_key="initial_states_probs", end_key="terminal_states_probs")
#use the inferred pseudotime to compute the directed PAGA.
scv.tl.paga(
adata,
groups="clusters",
root_key="initial_states_probs",
end_key="terminal_states_probs",
use_time_prior="velocity_pseudotime",
)
作图
cr.pl.cluster_fates(
adata,
mode="paga_pie",
cluster_key="clusters",
basis="umap",
legend_kwargs={"loc": "top right "},
legend_loc="top left",
node_size_scale=2,
edge_width_scale=1,
max_edge_width=2,
title="directed PAGA",
)