经过前两次课程的讲解,我们不断的将monocle3与monocle2、Seurat进行对比,相信大家已经能熟练掌握monocle3的语法与功能([《拟时序分析》5.monocle3的降维、分群、聚类(http://mp.weixin.qq.com/s?__biz=MzAwMzIzOTk5OQ==&mid=2247489494&idx=1&sn=44f0a44857c12fcf9a1ab84e4f2cbebf&chksm=9b3f6e86ac48e79026321de2fdca81a139d104d655884ffe87a68e0a1551ddffa5af7919d19e&scene=21#wechat_redirect))。
单细胞拟时序系列课程会继续更新,请大家持续关注:
视频教程已上传至B站,新做了一个片头,欢迎大家来三联:
https://www.bilibili.com/video/BV1br4y1x7Hf?p=10
图文教程
在我看来monocle3与monocle2最大的特点无非就是这俩:
1、可以交互式地选择拟时序的起点
2、可以采取3D的形式展示轨迹图
3.1.预处理数据
rm(list = ls());gc()#清除镜像重来
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 11383036 608.0 27726538 1480.8 27726538 1480.8
## Vcells 21313689 162.7 65077689 496.6 198600815 1515.3
#老一套、读入并创建cds对象expression_matrix <- readRDS('author.pro/expression_matrix.rds')cell_metadata <- readRDS('author.pro/cell_metadata.rds')gene_annotation <- readRDS('author.pro/gene_annotation.rds')cds <- new_cell_data_set(expression_matrix, cell_metadata = cell_metadata, gene_metadata = gene_annotation)cds <- preprocess_cds(cds, num_dim = 50)#预处理cds <- align_cds(cds, alignment_group = "plate", residual_model_formula_str = '~Size_Factor')
residual_model_formula_str可以指定连续变量进行去批次,例如某一基因/基因集的表达值,否则这些变量以分类变量的形式参与去批次会将每一个值视为一个批次从而产生极大的冗余计算量
降维、聚类
cds <- reduce_dimension(cds)
## No preprocess_method specified, and aligned coordinates have been computed previously. Using preprocess_method = 'Aligned'
plot_cells(cds, label_groups_by_cluster=FALSE, color_cells_by = "cao_cell_type")
## No trajectory to plot. Has learn_graph() been called yet?
## Warning: Removed 1 rows containing missing values (geom_text_repel).
#再次演示一下monocle版的Seurat::FeaturePlot()ciliated_genes <- c("che-1", "hlh-17", "nhr-6", "dmd-6", "ceh-36", "ham-1")plot_cells(cds, genes=ciliated_genes, label_cell_groups=FALSE, show_trajectory_graph=FALSE)
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
#感觉这个配色很丑,但是又没有参数去修改
我们monocle2那里谈过,虽然从计算学的角度来说细胞可以连续地从一种状态过渡到下一种状态,它们之间没有离散的边界,但从生物学上来说并不是所有细胞类型之间都能发生转化,所以Monocle3并不假设数据集中的所有细胞都来自一个共同的转录“祖先”,大家实际拿到的数据中可能有多个不同的轨迹。例如,在应对感染的组织中,组织常驻免疫细胞和基质细胞会有非常不同的初始转录组,对感染的反应也会非常不同,所以它们应该分属于不用的轨迹。因此,monocle3在进行拟时序分析时并不采取单起点的方式。
cds <- cluster_cells(cds)#聚类之后,每一个cluster会自成一个”拟时序轨迹“plot_cells(cds, color_cells_by = "partition")
这个图里黑色的线就是拟时序走向的背景,带有数字的灰色圈圈代表的是拟时序分支的leaf,他们被黑色的branch_points所隔开
3.2.拟时序分析
cds <- learn_graph(cds)#对每个cluster进行主成分分析,这步以后拟时序图谱就已经初步产生了
plot_cells(cds, color_cells_by = "cao_cell_type", label_groups_by_cluster=FALSE, label_leaves=FALSE, label_branch_points=FALSE)
## Warning: Removed 1 rows containing missing values (geom_text_repel).
#轨迹过多,稍微有点让人不适,还是建议大家在Seurat中处理好了之后拿一些特定的细胞类型来做拟时序分析plot_cells(cds, color_cells_by = "cao_cell_type", label_cell_groups=FALSE, label_leaves=TRUE,#展示分支 label_branch_points=TRUE,#展示分支节点 graph_label_size=1.5)
cds <- order_cells(cds)
plot_cells(cds, color_cells_by = "pseudotime", label_cell_groups=FALSE, label_leaves=FALSE, label_branch_points=FALSE, graph_label_size=1.5)
我们可以看到这个图中有许多灰色的点,这是因为这些单独的轨迹没有被选择root,所以产生了无效的拟时间值,也就是说,如果手动选择,那么每个拟时序的轨迹都需要选择一次root
说实话这个自动弹出交互式窗口的功能让我在写Rmarkdown的时候很抓狂,这个带有交互功能函数目前没法写在html文件中,所以,在我研读了一下源码之后,通过以下这种方式可以编程性选择拟时序的root,下面这个函数会选择最接近你选择细胞的节点作为拟时序的root
names(colData(cds))
## [1] "plate" "cao_cluster" "cao_cell_type" "cao_tissue"
## [5] "Size_Factor"
colData(cds)[,'cao_cell_type'] %>% unique()
## [1] "Unclassified neurons" "Germline"
## [3] "Intestinal/rectal muscle" "Vulval precursors"
## [5] "Coelomocytes" NA
## [7] "Ciliated sensory neurons" "Failed QC"
## [9] "Seam cells" "Non-seam hypodermis"
## [11] "Pharyngeal epithelia" "Touch receptor neurons"
## [13] "Body wall muscle" "Cholinergic neurons"
## [15] "Distal tip cells" "Other interneurons"
## [17] "GABAergic neurons" "Am/PH sheath cells"
## [19] "Pharyngeal muscle" "Pharyngeal neurons"
## [21] "Oxygen sensory neurons" "Somatic gonad precursors"
## [23] "flp-1(+) interneurons" "Canal associated neurons"
## [25] "Unclassified glia" "Pharyngeal gland"
## [27] "Sex myoblasts" "Excretory cells"
## [29] "Dopaminergic neurons" "Socket cells"
## [31] "Rectum"
colData(cds)[,'cao_cell_type'] %>% table()
## .
## Am/PH sheath cells Body wall muscle Canal associated neurons
## 421 10508 239
## Cholinergic neurons Ciliated sensory neurons Coelomocytes
## 1015 842 1358
## Distal tip cells Dopaminergic neurons Excretory cells
## 129 70 155
## Failed QC flp-1(+) interneurons GABAergic neurons
## 3483 224 400
## Germline Intestinal/rectal muscle Non-seam hypodermis
## 5144 338 1268
## Other interneurons Oxygen sensory neurons Pharyngeal epithelia
## 443 305 747
## Pharyngeal gland Pharyngeal muscle Pharyngeal neurons
## 271 332 314
## Rectum Seam cells Sex myoblasts
## 121 3523 302
## Socket cells Somatic gonad precursors Touch receptor neurons
## 358 355 334
## Unclassified glia Unclassified neurons Vulval precursors
## 208 2639 488
get_earliest_principal_node <- function(cds, my_select="Am/PH sheath cells"){ cell_ids <- which(colData(cds)[, "cao_cell_type"] == my_select) closest_vertex <- cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex closest_vertex <- as.matrix(closest_vertex[colnames(cds), ]) root_pr_nodes <- igraph::V(principal_graph(cds)[["UMAP"]])$name[as.numeric(names (which.max(table(closest_vertex[cell_ids,]))))] root_pr_nodes}cds <- order_cells(cds, root_pr_nodes=get_earliest_principal_node(cds))myselect <- function(cds,select.classify,my_select){ cell_ids <- which(colData(cds)[,select.classify] == my_select) closest_vertex <- cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex closest_vertex <- as.matrix(closest_vertex[colnames(cds), ]) root_pr_nodes <- igraph::V(principal_graph(cds)[["UMAP"]])$name[as.numeric(names (which.max(table(closest_vertex[cell_ids,]))))] root_pr_nodes}cds <- order_cells(cds, root_pr_nodes=myselect(cds,select.classify = 'cao_cell_type', my_select = "Body wall muscle") )#没问题,行得通
下图可以看的出来,我们制定了”Body wall muscle”为起点后,这群细胞便是紫色的”零值”
plot_cells(cds, color_cells_by = "pseudotime", label_cell_groups=FALSE, label_leaves=FALSE, label_branch_points=FALSE, graph_label_size=1.5)|plot_cells(cds, color_cells_by = "cao_cell_type", label_cell_groups=FALSE, label_leaves=FALSE, label_branch_points=FALSE, graph_label_size=1.5)
拟时序中的基因展示
plot_genes_in_pseudotime(cds[1:5,], color_cells_by="cao_cell_type", min_expr=0.5)
还有3D版的拟时序,来试试吧
cds_3d <- reduce_dimension(cds, max_components = 3)
## No preprocess_method specified, and aligned coordinates have been computed previously. Using preprocess_method = 'Aligned'
cds_3d <- cluster_cells(cds_3d)
cds_3d <- learn_graph(cds_3d)
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
## Warning in louvain_clustering(data, pd[row.names(data), ], k = k, weight = weight, : RANN counts the point itself, k must be smaller than
## the total number of points - 1 (all other points) - 1 (itself)!
## Warning in igraph::graph.dfs(stree_ori, root = root_cell, neimode = "all", :
## Argument `neimode' is deprecated; use `mode' instead
cds <- order_cells(cds, root_pr_nodes=myselect(cds,select.classify = 'cao_cell_type', my_select = "Body wall muscle") )cds_3d_plot_obj <- plot_cells_3d(cds_3d, color_cells_by="cao_cell_type")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
遗憾的是这个函数不能通过拟时间染色
cds_3d_plot_obj
![image.png](https://upload-images.jianshu.io/upload_images/28196887-9df635947263e119.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
欢迎关注同名公众号~