monocle2的使用

monocle2和monocle3的基本原理是一样的,详见monocle3系列之七:总结 - 简书 (jianshu.com)
总的来说,monocle2用起来更顺手,如果数据不是很大的话,还是更推荐monocle2。本文直接介绍monocle2的使用流程。

1)构建CellDataSet class

需要输入以下3个文件


HSMM_expr_matrix <- read.table("fpkm_matrix.txt")
HSMM_sample_sheet <- read.delim("cell_sample_sheet.txt")
HSMM_gene_annotation <- read.delim("gene_annotations.txt")
pd <- new("AnnotatedDataFrame", data = HSMM_sample_sheet)
fd <- new("AnnotatedDataFrame", data = HSMM_gene_annotation)
HSMM <- newCellDataSet(as.matrix(HSMM_expr_matrix), phenoData = pd, featureData = fd)

10X Genomics的数据可以通过以下方法载入:

cellranger_pipestance_path <- "/path/to/your/pipeline/output/directory"
gbm <- load_cellranger_matrix(cellranger_pipestance_path)

fd <- fData(gbm)
colnames(fd)[2] <- "gene_short_name"
gbm_cds <- newCellDataSet(exprs(gbm),
                  phenoData = new("AnnotatedDataFrame", data = pData(gbm)),
                  featureData = new("AnnotatedDataFrame", data = fd),
                  lowerDetectionLimit = 0.5,
                  expressionFamily = negbinomial.size())

Monocle 接受相对表达数据和基于计数的测量(例如 UMIs)。一般来说,在 UMI 数据上效果最佳。FPKM/TPM 值通常呈对数正态分布,而 UMI 或读数计数则更适合用负二项分布建模。数据分布可以参考以下内容:


HSMM <- newCellDataSet(count_matrix,
                phenoData = pd,
                featureData = fd,
                expressionFamily=negbinomial.size())

2)计算 size factors and dispersions

HSMM <- estimateSizeFactors(HSMM)
HSMM <- estimateDispersions(HSMM)

3)过滤低质量细胞

HSMM <- detectGenes(HSMM, min_expr = 0.1)
expressed_genes <- row.names(subset(fData(HSMM), num_cells_expressed >= 10))
print(head(fData(HSMM)))
gene_short_name biotype num_cells_expressed use_for_ordering
ENSG00000000003.10 TSPAN6 184 FALSE
ENSG00000000005.5 TNMD 0 FALSE
ENSG00000000419.8 DPM1 211 FALSE
ENSG00000000457.8 SCYL3 18 FALSE
ENSG00000000460.12 C1orf112 47 TRUE
ENSG00000000938.8 FGR 0 FALSE

4)Classifying and Counting Cells

Classifying cells by type

MYF5_id <- row.names(subset(fData(HSMM), gene_short_name == "MYF5"))
ANPEP_id <- row.names(subset(fData(HSMM),gene_short_name == "ANPEP"))

cth <- newCellTypeHierarchy()
cth <- addCellType(cth, "Myoblast", classify_func =function(x){ x[MYF5_id,] >= 1 })
cth <- addCellType(cth, "Fibroblast", classify_func = function(x){ x[MYF5_id,] < 1 & x[ANPEP_id,] > 1 })
HSMM <- classifyCells(HSMM, cth, 0.1)

Clustering cells without marker genes

disp_table <- dispersionTable(HSMM)
unsup_clustering_genes <- subset(disp_table, mean_expression >= 0.1)
HSMM <- setOrderingFilter(HSMM, unsup_clustering_genes$gene_id)
plot_ordering_genes(HSMM)

Clustering cells using marker genes

marker_diff <- markerDiffTable(HSMM[expressed_genes,],
            cth,
            residualModelFormulaStr = "~Media + num_genes_expressed",
            cores = 1)
candidate_clustering_genes <- row.names(subset(marker_diff, qval < 0.01))
marker_spec <- calculateMarkerSpecificity(HSMM[candidate_clustering_genes,], cth)
semisup_clustering_genes <- unique(selectTopMarkers(marker_spec, 500)$gene_id)
HSMM <- setOrderingFilter(HSMM, semisup_clustering_genes)

Imputing cell type

HSMM <- clusterCells(HSMM,
              num_clusters = 2,
              frequency_thresh = 0.1,
              cell_type_hierarchy = cth)

5)构建细胞轨迹

构建细胞轨迹需要3个步骤:

  • 选择构建轨迹的基因
diff_test_res <- differentialGeneTest(HSMM_myo[expressed_genes,], fullModelFormulaStr = "~Media")
ordering_genes <- row.names (subset(diff_test_res, qval < 0.01))
HSMM_myo <- setOrderingFilter(HSMM_myo, ordering_genes)
  • 降维
HSMM_myo <- reduceDimension(HSMM_myo, max_components = 2,method = 'DDRTree')
  • 排列细胞
HSMM_myo <- orderCells(HSMM_myo)
plot_cell_trajectory(HSMM_myo, color_by = "Hours")
plot_cell_trajectory(HSMM_myo, color_by = "Pseudotime")

6)差异表达分析

根据细胞类型找差异基因

diff_test_res <- differentialGeneTest(cds_subset, fullModelFormulaStr = "~CellType")
diff_test_res[,c("gene_short_name", "pval", "qval")]
plot_genes_jitter(cds_subset,
                  grouping = "CellType",
                  color_by = "CellType",
                  nrow= 1,
                  ncol = NULL,
                  plot_trend = TRUE)

根据伪时间找差异基因

diff_test_res <- differentialGeneTest(cds_subset, fullModelFormulaStr = "~sm.ns(Pseudotime)")
diff_test_res[,c("gene_short_name", "pval", "qval")]
plot_genes_in_pseudotime(cds_subset, color_by = "Hours")

monocle2经典热图

diff_test_res <- differentialGeneTest(HSMM_myo[marker_genes,],
              fullModelFormulaStr = "~sm.ns(Pseudotime)")
sig_gene_names <- row.names(subset(diff_test_res, qval < 0.1))
plot_pseudotime_heatmap(HSMM_myo[sig_gene_names,],
                num_clusters = 3,
                cores = 1,
                show_rownames = T)

总结

通常情况下,我们的数据都是已经过滤了低质量细胞,做完了细胞类型注释,所以在monocle2中只需要执行上述步骤中的1)2)5)6)。

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容