输入文件导入方法
Monocle拟时间导入数据方式有两种,一种是通过导入Seurat对象,一种是导入cellranger输出的表达矩阵(三个文件barcodes.tsv genes.tsv matrix.mtx)
#今天就介绍cellranger输出的表达矩阵导入方式
library(monocle)
library(cellrangerRkit)
library(Seurat)
indir='mydir/filtered_gene_bc_matrices/Danio_rerio/'
my_dir <- indir
gbm <- load_cellranger_matrix(my_dir)
my_feat <- fData(gbm)
names(my_feat) <- c('id', 'gene_short_name')
my_cds <- newCellDataSet(exprs(gbm),phenoData = new("AnnotatedDataFrame", data = pData(gbm)),featureData = new("AnnotatedDataFrame", data = my_feat),lowerDetectionLimit = 0.5,expressionFamily = negbinomial.size())
选定细胞或者样品进行拟时间分析
一般情况下,如果做单细胞转录组测序的话,都会用seurat软件包进行分析。经常会遇到这样的情况,我们只是对seurat分类结果中的几个聚类细胞感兴趣,想用几个聚类的细胞进行拟时间分析,这该如何处理?第一种方式就是导入seurat对象的时候,可以提前特定细胞进行后续分析;第二种方式就是通过seurat的聚类结果文件,提前相应的表达矩阵,转换成cellranger输出的三个文件的表达矩阵,这个方式太麻烦。
今天将介绍另外一种方法,特别的鉴定,就是直接通过monocle对象,提取相应的细胞对象即可:
my_cds_subset <- my_cds[expressed_genes, cells]
今天仅仅记录此方法,后续分析以后再进行记录。
2019年5月13日
拟时间分支点数目
有时候需要查看分支点的数目,但是分支点的数字代表什么意思还需要再研究。
cds@auxOrderingData[[cds@dim_reduce_type]]$branch_points
cds@auxOrderingData[["DDRTree"]]$pr_graph_cell_proj_closest_vertex
# Cell id to closest MST node.
id_to_mst_node <- cds@auxOrderingData[["DDRTree"]]$pr_graph_cell_proj_closest_vertex
# Change the MST node to match 'name' attribute in igraph ("Y_*").
id_to_mst_node[,1] <- sapply(id_to_mst_node[,1], function(x){paste(c("Y_", x), collapse = "")})
# Make a MST node to cell ids mapping.
mst_node_to_ids <- list()
for (node_name in unique(id_to_mst_node[,1])) {
mst_node_to_ids[[node_name]] <- names(id_to_mst_node[,1][id_to_mst_node[,1] == node_name])
}
##所有分支点名称
igraph::vertex_attr(minSpanningTree(cds))
想要更多的分支点和state
使用max_components参数,increasing the number for "max_components" Monocle will generate more branch points, more states and trajectories。
state值不连续属于正常
Monocle 2 learns a principal graph for the centroids of the raw data cloud and then projects the raw data points on to this principal graph. the missing states happen when the cell in the transition state are very few and those points are projected to nearby branches. This should not affect any downstream analysis. And we will have better ways to handle this in next Monocle release.
seurat3.0对象导入
seurat升级到3.0以后,importcds函数可能不能用了,需要新的手动导入方式:
#Load Seurat object
seurat_object <- readRDS('seurat_object.rds')
#Extract data, phenotype data, and feature data from the SeuratObject
data <- as(as.matrix(seurat_object@assays$RNA@data), 'sparseMatrix')
pd <- new('AnnotatedDataFrame', data = seurat_object@meta.data)
fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))
fd <- new('AnnotatedDataFrame', data = fData)
#Construct monocle cds
monocle_cds <- newCellDataSet(data,
phenoData = pd,
featureData = fd,
lowerDetectionLimit = 0.5,
expressionFamily = negbinomial.size())
拟时间分析热图的top50基因输出
我们通过拟时间值,对差异前50个基因进行热图分析,得到如下热图,那么热图的top50基因如何输出呢?
graph <- paste(prefix,'pseudotime_heatmap.pdf',sep='_')
pdf(graph, w=12, h=8)
my_pseudotime_cluster <-plot_pseudotime_heatmap(HSMM_myo[gene_to_cluster,],num_clusters = 3,cores = 4,show_rownames = TRUE,return_heatmap = TRUE)
#plot_grid(my_pseudotime_cluster)
dev.off()
print('Finished -plot_pseudotime_heatmap............')
######
#####
## hierarchical clustering was used to cluster the genes
# we can cut the dendrogram to form the same 3 clusters as plot_pseudotime_heatmap
my_cluster <- cutree(my_pseudotime_cluster$tree_row, 3)
print('my_cluster')
my_cluster
# genes in cluster 1
cluster_1<-my_pseudotime_de[names(my_cluster[my_cluster == 1]),"gene_short_name"]
# genes in cluster 2
cluster_2<-my_pseudotime_de[names(my_cluster[my_cluster == 2]),"gene_short_name"]
# genes in cluster 3
cluster_3<-my_pseudotime_de[names(my_cluster[my_cluster == 3]),"gene_short_name"]
tmpe2<-data.frame(Gene=c(cluster_1,cluster_2,cluster_3),Cluster=c(rep('cluster1',length(cluster_1)),rep('cluster2',length(cluster_2)),rep('cluster3',length(cluster_3))))
write.csv(tmpe2,paste(prefix,'pseudotime_heatmap_genes.csv',sep="_"),quote=F,row.names=F)
最终会得到如下表格:
2019年5月20日