单细胞Day8-1-单样本拟时序

1.定义

做拟时序分析是为了探索自己感兴趣的几种细胞之间的发育关系，一般不是用全部类型的细胞来做的。以下是进行拟时序分析的几个主要目的：细胞状态推断、细胞轨迹追踪、细胞命运预测、动态过程建模、细胞亚群识别、基因调控网络推断、疾病机理探索、药物作用机制研究。
输入数据是seurat做完降维聚类分群注释的数据，从调用之前注释好的数据开始

rm(list = ls())
library(Seurat)
library(monocle)
library(dplyr)
load("sce.Rdata")
scRNA = sce
table(Idents(scRNA))
head(scRNA@meta.data)
DimPlot(scRNA,label = T)
scRNA$celltype = Idents(scRNA)  #celltype是细胞类型注释，添加到数据中

做拟时序分析通常不是拿全部的细胞，而是拿感兴趣的一部分。用subset提取子集即可。因为要使用差异基因来排序，所以要两类及以上细胞。基于背景知识选择有进化关系的细胞类型。

levels(Idents(scRNA)) #打出来细胞类型供复制
scRNA = subset(scRNA,idents = c("NK","CD8 T"))
table(Idents(scRNA))
set.seed(1234)      #抽样，实战时不能抽样！！！！！
scRNA = subset(scRNA,downsample = 100)

因为monocle和seurat是两个不同的体系，所以需要将seurat对象转换为monocle可以接受的CellDataSet对象。虽然monocle3已经出来很久了，但大家都不约而同的选择monocle2，大概就是习惯了吧。。

2.创建CellDataSet对象

以下代码无需改动

# count矩阵，官方建议用count
ct <- scRNA@assays$RNA$counts
# 基因注释
gene_ann <- data.frame(
  gene_short_name = row.names(ct), 
  row.names = row.names(ct)
)
fd <- new("AnnotatedDataFrame",
          data=gene_ann)
# 临床信息
pd <- new("AnnotatedDataFrame",
          data=scRNA@meta.data)
#新建CellDataSet对象
sc_cds <- newCellDataSet(
  ct, 
  phenoData = pd,
  featureData =fd,
  expressionFamily = negbinomial.size(),
  lowerDetectionLimit=1)
sc_cds

3.构建细胞发育轨迹

1.使用seurat给出的高变化基因
2.按照平均表达量大于某个数字(比如0.1，官网用的是这个)的基因。
3.使用不同细胞类型之间的差异基因，differentialGeneTest计算。
根据以往经验，选择使用最后一个策略效果可能会更好。
以下代码无需改动

sc_cds <- estimateSizeFactors(sc_cds)
sc_cds <- estimateDispersions(sc_cds)
fdif = "diff_test_res.Rdata"
if(!file.exists(fdif)){
  diff_test_res <- differentialGeneTest(sc_cds,
                                        fullModelFormulaStr = "~celltype",
                                        cores = 4)
  save(diff_test_res,file = fdif)
}
load(fdif)
ordering_genes <- row.names(subset(diff_test_res, qval < 0.01))
#查看基因，筛选适合用于排序的，设置为排序要使用的基因
head(ordering_genes)
length(ordering_genes)
sc_cds <- setOrderingFilter(sc_cds, ordering_genes)
#画出选择的基因
plot_ordering_genes(sc_cds)

plot_ordering_genes画的图纵坐标是基因表达量的变异性（离散性）,横坐标是每个基因在所有细胞种的平均表达量。
红线表示Monocle2基于横纵坐标的关系拟合的变异期望值，为用于排序的基因是黑色，其他基因是灰色

image.png

#降维
sc_cds <- reduceDimension(sc_cds)
#细胞排序
sc_cds <- orderCells(sc_cds)

4.绘图展示

4.1 发育轨迹图

library(ggsci)
p1 = plot_cell_trajectory(sc_cds)+ scale_color_nejm()
p2 = plot_cell_trajectory(sc_cds, color_by = 'Pseudotime') 
p3 = plot_cell_trajectory(sc_cds, color_by = 'celltype')  + scale_color_npg()
library(patchwork)
p2+p1/p3

image.png

Pseudotime数值从小到大就是顺序就是推断的发育顺序。图上的点颜色越深，时间越早，颜色越浅，时间越晚。
state是发育的不同阶段，数值越小越靠前。
celltype则可以看到具体的细胞类型在时间轨迹图上的分布。

4.2 经典的拟时序热图

展示了一些基因是如何随着时间轨迹的变化而渐变的，这个渐变不同于findmarkers，是体现变化过程的，而不是直接给出差异表达的基因。
理论上应该是一句代码一行，如果两句代码想放在同一行上就要用;隔开。可以减少点Run或者按快捷键的次数。
选择了q值最小的50个基因来画热图。数量可以按需调整（head(50)），其他代码不用改。

gene_to_cluster = diff_test_res %>% arrange(qval) %>% head(50) %>% pull(gene_short_name);head(gene_to_cluster)
plot_pseudotime_heatmap(sc_cds[gene_to_cluster,],
                        num_clusters = nlevels(Idents(scRNA)),   #是把基因聚成几簇，推荐是由几类细胞就设置几，用nlevels实现了自动化
                        show_rownames = TRUE,    #是把3种颜色切割成渐变的100种颜色
                        cores = 4,return_heatmap = TRUE,
                        hmcols = colorRampPalette(c("navy", "white", "firebrick3"))(100))

image.png

4.3 基因轨迹图

用感兴趣的基因给轨迹图着色

gs = head(gene_to_cluster)     #感兴趣的基因
plot_cell_trajectory(sc_cds,markers=gs,
                     use_color_gradient=T)

image.png

4.4 基因拟时序点图

plot_genes_in_pseudotime(sc_cds[gs,],
                 color_by = "celltype",
                 nrow= 3, #6个基因所以排了3行，数量有变化时要改
                 ncol = NULL )

横坐标是按照pseudotime排好顺序的。

image.png

改成2个B细胞群："naive B","plasma B"

改细胞群重新跑的时候，创建CellDataSet对象出bug

image.png

解决方法：更新igraph包
ps：另一个装包error

install.packages("igraph")
Error in install.packages : Updating loaded packages

这种情况通常发生在某些需要更新的包当前正在使用。需要重启R，在什么包都没加载的时候再更新。
切换细胞子集重跑结果：

image.png

单细胞Day8-1-单样本拟时序

1.定义

2.创建CellDataSet对象

3.构建细胞发育轨迹

4.绘图展示

4.1 发育轨迹图

4.2 经典的拟时序热图

4.3 基因轨迹图

4.4 基因拟时序点图

改成2个B细胞群："naive B","plasma B"

推荐阅读更多精彩内容