scRNA---Day11(monocle)

pData()函数
Retrieve information on experimental phenotypes recorded in eSet and ExpressionSet-derived classes.

newCellDataSet()函数
创建monocle对象：
newCellDataSet(cellData, phenoData = NULL, featureData = NULL, lowerDetectionLimit = 0.1, expressionFamily = VGAM::negbinomial.size())
Arguments:
cellData: expression data matrix for an experiment 需要表达矩阵，类型为matrix
但如果是10X数据，CellRanger输出的一系列RNA-Seq pipelines，已经是sparseMatrix的格式，所以你在做这一步的时候就不应该再用as.matrix()来转换格式了，否则会占用你大量的内存
phenoData:data frame containing attributes of individual cells 获取样本属性
featureData:data frame containing attributes of features (e.g. genes) 获取基因信息，要求有一列信息为“gene_short_name”即基因symbol
expressionFamily:the VGAM family function to be used for expression response variables 表达矩阵的归一化形式，counts或UMI计数最好使用负二项分布建模，即参数为negbinomial.size()

estimateSizeFactors()函数
These generic functions provide basic interfaces to operations on and data access to count datasets.
如果在上面创建对象的步骤里，归一化用的是negbinomial()或者是negbinomial.size()，那么这一步只需使用estimateSizeFactors() and estimateDispersions()，用来标准化细胞之间的差异，以及计算分散度值（后续会用到）

detectGenes()
筛选基因：HSMM <- detectGenes(HSMM, min_expr = 0.1)
print(head(fData(HSMM)))
fData():These generic functions access feature data (experiment specific information about features) and feature meta-data (e.g., descriptions of feature covariates).提取表达基因（feature）的信息

classifyCells()
判断细胞类型及分类;根据经典的标志性marker判断；弊端：得到的基本是想要的，但同时可能会造成许多细胞丢失
不跟据marker基因进行细胞分类
2.1 dispersionTable():Calling estimateDispersions computes a smooth function describing how variance in each gene's expression across cells varies according to the mean. This function only works for CellDataSet objects containing count-based expression data, either transcripts or reads.

构建细胞轨迹（monocle包的特色）

悖论：尽可能少的使用已有的生物学背景知识，单纯从数据出发发现重要的feature.

根据时间点的差异分析来选择基因构建轨迹通常是非常有效的，即从不同阶段的生物学过程收集的样本
如果做不到1，monocle可以从同时捕获的单个细胞群中重建它们的轨迹