loom文件简介:http://linnarssonlab.org/loompy/format/index.html
seurat:https://satijalab.org/seurat/mca_loom.html
https://satijalab.org/loomR/loomR_tutorial.html
knitr::opts_chunk$set(echo = TRUE)
options(encoding = "UTF-8")
loom文件结构
The loom file is simply an HDF5 file with a strict structure imposed on it. This structure helps keep consistency in an otherwise unordered binary file and provides security in the knowledge of which data is which. Below is a summary of the file structure and rules imposed on each dataset; for more details, please read the loom file specification.
matrix
The root of a loom file’s structure, it has two dimensions of n genes and m cells
layers
Alternative representations of the data in matrix, must have the same dimensions as matrix
row_attrs and col_attrs
Metadata for rows (genes) and columns (cells), respectively; each dataset in these groups must be one- or two-dimensional and the first dimension must be n for row_attrs or m forcol_attrs
row_graphs and col_graphs
Sparse cluster graphs in coordinate form; each graph is a group with three equal-length datasets: a (row index), b (column index), and w (value)
library(Seurat)
library(loomR)
library(dplyr)
pbmc_small_loom<-create(filename = "pbmc.small.loom",data = pbmc_small@assays$RNA@counts,overwrite = T)
#这里也可以直接as.loom()
查看loom文件中的关键信息
pbmc_small_loom
pbmc_small_loom$matrix[1:6,1:6]
pbmc_small_loom$col.attrs$CellID[1:6]
pbmc_small_loom$row.attrs$Gene[1:6]
从loom中提取信息
As metadata is stored in several datasets within a loom file, each loom object has a get.attribute.df method: a method for collecting various metadata datasets and organizing them into a data frame for ease of use. This method takes a direction to look in (either 1 or 2 for row (gene) or column (cell) metadata, respectively) and a list of metadata dataset names. See below in the “Chunk-based iteration” section for details about MARGINs in loomR.
MARGIN:
Several methods for a loom object have a MARGIN argument; this argument tells the loom file on which dimension to iterate over, add, or fetch data. To keep consistent with other R tools for single-cell RNAseq analysis, a MARGIN of 1 represents the rows, or genes, while a MARGIN of 2 represents the columns, or cells. This also applies to the shape field of a loom object: index 1 represents the number of genes in a loom file while index 2 represents the number of cells.
pbmc_small_loom$get.attribute.df(MARGIN = 1,attributes = "Gene")[1:6,]
pbmc_small_loom$get.attribute.df(MARGIN = 2,attributes = "CellID")[1:6,]
向loom中添加信息
We can layers, gene-level metadata (row_attrs), and cell-level metadata (col_attrs) to a loom object using loomR. You can read full details at the loom file specification.
Methods for adding layers and matrices are provided by the loom class with add.layer, add.row.attribute, and add.col.attribute. All of the adding methods take a named list of either matrices or vectors. For example, to ENSEMBL IDs to gene-level metadata would be done as follows:
# Generate random ENSEMBL IDs for demonstration purposes
ensembl.ids <- paste0("ENSG0000", 1:length(x = pbmc_small_loom$row.attrs$Gene[]))
# Use add.row.attribute to add the IDs Note that if you want to overwrite an
# existing value, set overwrite = TRUE
pbmc_small_loom$add.row.attribute(list(ensembl.id = ensembl.ids), overwrite = TRUE)
pbmc_small_loom$get.attribute.df(MARGIN = 1)[1:6,]
进行seurat操作
pbmc_small_seurat<-as.Seurat(pbmc_small_loom)
pbmc_small_seurat<-NormalizeData(pbmc_small_seurat)%>%ScaleData()
关闭loom
pbmc_small_loom$close_all()