pData()函数
Retrieve information on experimental phenotypes recorded in eSet and ExpressionSet-derived classes.
newCellDataSet()函数
创建monocle对象:
newCellDataSet(cellData, phenoData = NULL, featureData = NULL, lowerDetectionLimit = 0.1, expressionFamily = VGAM::negbinomial.size())
Arguments:
cellData: expression data matrix for an experiment 需要表达矩阵,类型为matrix
但如果是10X数据,CellRanger输出的一系列RNA-Seq pipelines,已经是sparseMatrix的格式,所以你在做这一步的时候就不应该再用as.matrix()来转换格式了,否则会占用你大量的内存
phenoData:data frame containing attributes of individual cells 获取样本属性
featureData:data frame containing attributes of features (e.g. genes) 获取基因信息,要求有一列信息为“gene_short_name”即基因symbol
expressionFamily:the VGAM family function to be used for expression response variables 表达矩阵的归一化形式,counts或UMI计数最好使用负二项分布建模,即参数为negbinomial.size()
estimateSizeFactors()函数
These generic functions provide basic interfaces to operations on and data access to count datasets.
如果在上面创建对象的步骤里,归一化用的是negbinomial()或者是negbinomial.size(),那么这一步只需使用estimateSizeFactors() and estimateDispersions(),用来标准化细胞之间的差异,以及计算分散度值(后续会用到)
detectGenes()
筛选基因:HSMM <- detectGenes(HSMM, min_expr = 0.1)
print(head(fData(HSMM)))
fData():These generic functions access feature data (experiment specific information about features) and feature meta-data (e.g., descriptions of feature covariates).提取表达基因(feature)的信息
-
classifyCells()
判断细胞类型及分类;根据经典的标志性marker判断;弊端:得到的基本是想要的,但同时可能会造成许多细胞丢失 - 不跟据marker基因进行细胞分类
2.1 dispersionTable():Calling estimateDispersions computes a smooth function describing how variance in each gene's expression across cells varies according to the mean. This function only works for CellDataSet objects containing count-based expression data, either transcripts or reads.
构建细胞轨迹(monocle包的特色)
悖论:尽可能少的使用已有的生物学背景知识,单纯从数据出发发现重要的feature.
- 根据时间点的差异分析来选择基因构建轨迹通常是非常有效的,即从不同阶段的生物学过程收集的样本
- 如果做不到1,monocle可以从同时捕获的单个细胞群中重建它们的轨迹