R语言与python语言读写loom文件

最近越来越多的分析需要用到loom文件，其中主要是python的数据分析以及RNA速率分析的时候，格式之间的转化需要我们分析一下，这里分享给大家R及python对于loom文件的操作

R语言读取loom文件

读取loom文件的软件主要是RNA速率分析的软件velocyto.R
不过这里注意R语言读取loom需要hdf5r，比较难装，大家可以多研究一下,这里提供一个例子
hdf5r安装

###安装hdf5r
curl -O https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.10/hdf5-1.10.1/src/hdf5-1.10.1.tar
cd hdf5-1.10.1
./configure
make -j4
make check
make install

library(velocyto.R)
data <- read.loom.matrices(input.loom)

R语言创建loom文件

rds转换成loom

library(SCopeLoomR)
library(Seurat)

#creat directory
if (!dir.exists(paste0(outdir,'/input'))){
  dir.create(paste0(outdir,'/input'))
}

data1 <- readRDS(rds)
barcode <- read.table(barcode,sep=',',header = T,stringsAsFactors = F)

#seurat3_nonsingle and single
data2 <- as.matrix(data1@assays$RNA@counts)
m <- na.omit(match(barcode$Barcode,colnames(data2)))
data3 <- data2[,m]

#build_loom
setwd(paste0(outdir,'/input'))
build_loom(file.name="original.loom",dgem=data3)

R语言独自创建loom文件,同样是R包SCopeLoomR

library(SCopeLoomR)
看一下函数build_loom
参数:
file.name: A string naming the .loom file to be generated.创建loom 的名称
dgem: A matrix of the gene expression with M genes as rows and N cells as columns.  矩阵文件
title: A short description of content of loom.     loom文件的描述
genome: The genome used for the mapping.    人或者小鼠
default.embedding: A M-by-2 data.frame of the embedding (X and Y coordinates) of the cells.坐标文件
default.embedding.name: A description name for the given default.embedding
##参数足够用了
添加信息的时候：
loom <- open_loom(file.name)
add_hierarchy(loom = loom, hierarchy = create_hierarchy(level.1.name = "Mouse", level.2.name = "Toy Datasets", level.3.name = ""))
add_col_attr(loom=loom, key = "Cell type", value=cell.info$cellType, as.annotation=T)
###添加seurat的信息
seurat.annotation<-read.table(file = paste0(seuratDir, "Res2_Clusters.tsv", header=T, quote = '', sep = "\t", stringsAsFactors=F))
add_seurat_clustering(loom = loom
                      , seurat = seurat
                      , default.clustering.resolution = "res.2"
                      , annotation = seurat.annotation
                      , annotation.cluster.id.cn = "res.2" 
                      , annotation.cluster.description.cn = "Annotation")

可见这个包专门为单细胞数据分析而生

记住一点，维度必须相同

python读取loom文件

python读取loom主要是pyscenic和scanpy的需要

import scanpy as sc
data = scanpy.read_loom('brain10x.loom', sparse=True),一般读取就可以
sparse参数指的是      是否读取稀疏数据矩阵
import  loompy as lp
lf = lp.connect(f_pyscenic_output, mode='r+', validate=False )
##关闭
lf.close()

python创建loom文件

import loompy as lp
row_attrs = {
    "Gene": np.array(adata.var_names) ,
}
col_attrs = {
    "CellID": np.array(adata.obs_names) ,
    "nGene": np.array( np.sum(adata.X.transpose()>0 , axis=0)).flatten() ,
    "nUMI": np.array( np.sum(adata.X.transpose() , axis=0)).flatten() ,
}
lp.create(sample + '.loom', adata.X.transpose(), row_attrs, col_attrs)
##这个地方与R相似，但是python更好用一点

希望大家保持学习的态度，请保持愤怒，让王多鱼倾家荡产