最近越来越多的分析需要用到loom文件,其中主要是python的数据分析以及RNA速率分析的时候,格式之间的转化需要我们分析一下,这里分享给大家R及python对于loom文件的操作
R语言读取loom文件
读取loom文件的软件主要是RNA速率分析的软件velocyto.R
不过这里注意R语言读取loom需要hdf5r,比较难装,大家可以多研究一下,这里提供一个例子
hdf5r安装
###安装hdf5r
curl -O https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.10/hdf5-1.10.1/src/hdf5-1.10.1.tar
cd hdf5-1.10.1
./configure
make -j4
make check
make install
library(velocyto.R)
data <- read.loom.matrices(input.loom)
R语言创建loom文件
rds转换成loom
library(SCopeLoomR)
library(Seurat)
#creat directory
if (!dir.exists(paste0(outdir,'/input'))){
dir.create(paste0(outdir,'/input'))
}
data1 <- readRDS(rds)
barcode <- read.table(barcode,sep=',',header = T,stringsAsFactors = F)
#seurat3_nonsingle and single
data2 <- as.matrix(data1@assays$RNA@counts)
m <- na.omit(match(barcode$Barcode,colnames(data2)))
data3 <- data2[,m]
#build_loom
setwd(paste0(outdir,'/input'))
build_loom(file.name="original.loom",dgem=data3)
R语言独自创建loom文件,同样是R包SCopeLoomR
library(SCopeLoomR)
看一下函数build_loom
参数:
file.name: A string naming the .loom file to be generated.创建loom 的名称
dgem: A matrix of the gene expression with M genes as rows and N cells as columns. 矩阵文件
title: A short description of content of loom. loom文件的描述
genome: The genome used for the mapping. 人或者小鼠
default.embedding: A M-by-2 data.frame of the embedding (X and Y coordinates) of the cells.坐标文件
default.embedding.name: A description name for the given default.embedding
##参数足够用了
添加信息的时候:
loom <- open_loom(file.name)
add_hierarchy(loom = loom, hierarchy = create_hierarchy(level.1.name = "Mouse", level.2.name = "Toy Datasets", level.3.name = ""))
add_col_attr(loom=loom, key = "Cell type", value=cell.info$cellType, as.annotation=T)
###添加seurat的信息
seurat.annotation<-read.table(file = paste0(seuratDir, "Res2_Clusters.tsv", header=T, quote = '', sep = "\t", stringsAsFactors=F))
add_seurat_clustering(loom = loom
, seurat = seurat
, default.clustering.resolution = "res.2"
, annotation = seurat.annotation
, annotation.cluster.id.cn = "res.2"
, annotation.cluster.description.cn = "Annotation")
可见这个包专门为单细胞数据分析而生
记住一点,维度必须相同
python读取loom文件
python读取loom主要是pyscenic和scanpy的需要
import scanpy as sc
data = scanpy.read_loom('brain10x.loom', sparse=True),一般读取就可以
sparse参数指的是 是否读取稀疏数据矩阵
import loompy as lp
lf = lp.connect(f_pyscenic_output, mode='r+', validate=False )
##关闭
lf.close()
python创建loom文件
import loompy as lp
row_attrs = {
"Gene": np.array(adata.var_names) ,
}
col_attrs = {
"CellID": np.array(adata.obs_names) ,
"nGene": np.array( np.sum(adata.X.transpose()>0 , axis=0)).flatten() ,
"nUMI": np.array( np.sum(adata.X.transpose() , axis=0)).flatten() ,
}
lp.create(sample + '.loom', adata.X.transpose(), row_attrs, col_attrs)
##这个地方与R相似,但是python更好用一点
希望大家保持学习的态度,请保持愤怒,让王多鱼倾家荡产