非模式物种的RNA速率分析(二)

loom文件生成后就是Seurat结果和loom文件的整合了
可以分为在R中Seurat提取分组及坐标信息,以及python中整合loom文本和Seurat的信息以及可视化。

高亮:这里主要是要注意loom文件中barcode样本名称前缀的和Seurat中Cell ID的前缀要吻合

参考:https://www.jianshu.com/p/70c19748ac1a

(一)提取Seruat中的信息

table(seurat_obj$celltype)
# 获得每个细胞的UMAP或TSNE坐标,使用 Embeddings函数
write.csv(Embeddings(seurat_obj, reduction = "umap"), file = "cell_embeddings.csv")
# 获取每个细胞的barcode
write.csv(Cells(seurat_obj), file = "cellID_obs.csv", row.names = FALSE)
# 提取每个细胞的cluster信息
write.csv(seurat_obj@meta.data[, 'cluster', drop = FALSE], file = "cell_clusters.csv")
# 提取每个细胞的celltype信息
write.csv(seurat_obj@meta.data[, 'celltype', drop = FALSE], file = "cell_celltype.csv")
# 获取celltype的颜色信息
hue_pal()(length(levels(seurat_obj$celltype)))
# 获取cluster的颜色信息
hue_pal()(length(levels(seurat_obj$cluster)))

(二)多个loom文件整合

import loompy
files=['sample_one','sample_two']
output_filename='combined.loom'
loompy.combine(files, output_filename, key="Accession")

(三) loom文件和Seurat合并

import scvelo as scv
import pandas as pd
import numpy as np
import os

loom_data = scv.read('combined.loom', cache=False)
#这里就是给loom文件中的barcode做替换,使其与Seurat中一致
loom_data.obs = loom_data.obs.rename(index = lambda x: x.replace(':', '_').replace('x', '-1').replace('*****','*****').replace('******','*****'))
loom_data.obs.head()
#读取seurat中的meta信息
meta_path = "/path/"
sample_obs = pd.read_csv(os.path.join(meta_path, "cellID_obs.csv"))
cell_umap= pd.read_csv(os.path.join(meta_path, "cell_embeddings.csv"), header=0, names=["CellID", "UMAP_1", "UMAP_2"])
cell_clusters = pd.read_csv(os.path.join(meta_path, "cell_clusters.csv"), header=0, names=["CellID", "cluster"])
cell_celltype = pd.read_csv(os.path.join(meta_path, "cell_celltype.csv"), header=0, names=["CellID", "celltype"])
#对细胞文件和RNA剪切速率文件取交集,保留关注的细胞类型
sample_one = loom_data[np.isin(loom_data.obs.index, sample_obs)]
sample_one.obs.head()
#构建umap坐标,cluster,celltype信息数据框
sample_one_index = pd.DataFrame(sample_one.obs.index)
sample_one_index = sample_one_index.rename(columns = {0:'CellID'})

umap_ordered = sample_one_index.merge(cell_umap, on = "CellID")
umap_ordered.head()
celltype_ordered = sample_one_index.merge(cell_celltype, on = "CellID")
celltype_ordered.head()
clusters_ordered = sample_one_index.merge(cell_clusters, on = "CellID")
clusters_ordered.head()
#将umap坐标与cluster信息加入sample_one
umap_ordered = umap_ordered.iloc[:,1:]
clusters_ordered = clusters_ordered.iloc[:,1:]
celltype_ordered = celltype_ordered.iloc[:,1:]
sample_one.obsm['X_umap'] = umap_ordered.values
sample_one.uns['clusters'] = clusters_ordered.values
sample_one.obs['celltype'] = celltype_ordered.values

adata = sample_one
# some gene labels are duplicated (Ensembl IDs are still unique!!)
adata.var_names_make_unique()

# save model to file
adata.write('bl_celltype_dynamicModel.h5ad', compression = 'gzip')

# read h5ad file
adata= scv.read('mbc_celltype_dynamicModel.h5ad')
#可视化
# Running RNA Velocity
scv.pp.filter_and_normalize(adata,min_shared_counts=30, n_top_genes=2000)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
scv.tl.velocity(adata, mode = "stochastic")
scv.tl.velocity_graph(adata)
scv.pl.velocity_embedding(adata, basis='X_umap', arrow_size=5)
ident_colours = ["#B2DF8A", "#FB9A99", "#33A02C", "#A6CEE3", "#1F78B4"]
scv.pl.velocity_embedding_stream(adata, basis='X_umap',color = "celltype", palette = ident_colours, size = 20,alpha =0.8, save="UMAP_stream.pdf", figsize=(7,5), legend_fontsize = 9, show=False, title='')

Python比R对于RNA速率分析的软件更兼容,虽然我python基础非常之薄弱,但是基本代码原理都是相同的,但之后还是好好学习python了。

更新之后会出现一个报错

TypeError: Neighbors.compute_neighbors() got an unexpected keyword argument 'write_knn_indices'
github上给出的答案是将scv替换成sc

import scanpy as sc
sc.pp.pca(adata)
sc.pp.neighbors(adata, n_pcs=30, n_neighbors=30)
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容