参考教程链接:
- https://github.com/dmcable/spacexr
- 官方教程: https://github.com/dmcable/spacexr/tree/master/vignettes
- 文献解读:10X单细胞空间联合分析之十(RCTD) https://www.jianshu.com/p/92a6df0dcb1b
前面学习了RCTD
今天先来学习Full mode: full mode on Visium hippocampushttps://raw.githack.com/dmcable/spacexr/master/vignettes/spatial-transcriptomics.html). full mode可以用于100-micron resolution Visium数据。
使用CSIDE来分析Visium空间转录组数据。
环境准备
library(spacexr)
library(Matrix)
library(doParallel)
library(ggplot2)
# 示例数据文件夹 # directory for sample Visium dataset
datadir <- system.file("extdata",'SpatialRNA/VisiumVignette',package = 'spacexr')
if(!dir.exists(datadir))
dir.create(datadir)
# 输出文件夹,可以换成自己指定的
savedir <- 'RCTD_results'
if(!dir.exists(savedir))
dir.create(savedir)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
cache = TRUE,
out.width = "100%"
)
Introduction
本次使用数据为hippocampus Visium数据,首先,使用一个已经注释好的单细胞数据海马体数据集对空间数据用RCTD进行细胞注释,然后做区域差异表达分析。
数据预处理和运行RCTD
full mode可以对每个spots反卷积成任意细胞数,对于Visium空间数据比较合适。
首先,读取空间数据,需要的是counts矩阵与空间坐标
### Load in/preprocess your data, this might vary based on your file type
# load in counts matrix
counts <- as.data.frame(readr::read_csv(file.path(datadir,"counts.csv")))
coords <- read.csv(file.path(datadir,"coords.csv"))
rownames(counts) <- counts[,1]; counts[,1] <- NULL
rownames(coords) <- coords[,1]; coords[,1] <- NULL
# In this case, total counts per pixel is nUMI
nUMI <- colSums(counts)
puck <- SpatialRNA(coords, counts, nUMI)
barcodes <- colnames(puck@counts)
p <- plot_puck_continuous(puck, barcodes, puck@nUMI, ylimit = c(0,round(quantile(puck@nUMI,0.9))),
title ='plot of nUMI', size=4.5, alpha=0.8)
ggsave(paste0(savedir, "/Spaital_nUNI_Plot.png"), width=8, height=6, plot=p,bg="white")
空间每个spot中表达的nUMI值:
看一下构建好的SpatialRNA对象
str(puck)
Formal class 'SpatialRNA' [package "spacexr"] with 3 slots
..@ coords:'data.frame': 313 obs. of 2 variables:
.. ..$ x: int [1:313] 4125 4125 4604 3047 4125 3047 4005 4125 3646 4484 ...
.. ..$ y: int [1:313] 5477 3000 4101 3895 6028 4858 5684 3964 5752 3069 ...
..@ counts:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. .. ..@ i : int [1:85720] 0 1 2 3 4 5 6 7 8 9 ...
.. .. ..@ p : int [1:314] 0 275 551 827 1101 1377 1650 1926 2202 2478 ...
.. .. ..@ Dim : int [1:2] 276 313
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : chr [1:276] "Rpl7" "Cox5b" "Rpl31" "Rpl37a" ...
.. .. .. ..$ : chr [1:313] "AAAGGGATGTAGCAAG-1" "AAAGGGCAGCTTGAAT-1" "AACAACTGGTAGTTGC-1" "AACCCAGAGACGGAGA-1" ...
.. .. ..@ x : num [1:85720] 8 17 8 11 7 8 6 4 9 15 ...
.. .. ..@ factors : list()
..@ nUMI : Named num [1:313] 4818 13002 7156 5612 5844 ...
.. ..- attr(*, "names")= chr [1:313] "AAAGGGATGTAGCAAG-1" "AAAGGGCAGCTTGAAT-1" "AACAACTGGTAGTTGC-1" "AACCCAGAGACGGAGA-1" ...
其次,读取单细胞参考数据。
单细胞需要的是counts矩阵,以及做好的细胞注释信息,和nUMI数据。
# directory for the reference
refdir <- system.file("extdata",'Reference/Visium_Ref',package = 'spacexr')
# load in cell types
counts <- read.csv(file.path(refdir,"counts.csv"))
rownames(counts) <- counts[,1]; counts[,1] <- NULL
# load in cell types
cell_types <- read.csv(file.path(refdir,"cell_types.csv"))
cell_types <- setNames(cell_types[[2]], cell_types[[1]])
cell_types <- as.factor(cell_types)
# load in cell types
nUMI <- read.csv(file.path(refdir,"nUMI.csv"))
nUMI <- setNames(nUMI[[2]], nUMI[[1]])
reference <- Reference(counts, cell_types, nUMI)
简单看一下数据:原始counts
counts[1:4, 1:4]
细胞类型:
head(cell_types)
nUMI数据:
head(nUMI)
参考对象:
str(reference)
Formal class 'Reference' [package "spacexr"] with 3 slots
..@ cell_types: Factor w/ 17 levels "Astrocyte","CA1",..: 1 1 1 1 1 1 1 1 1 1 ...
.. ..- attr(*, "names")= chr [1:510] "P60HippoRep6P2_GACTTGAAATGC" "P60HippoRep2P2_ACTGAAGAATTN" "P60HippoRep2P2_CCTTAGCCCTGG" "P60HippoRep5P3_GTCAAGGCGGGC" ...
..@ counts :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. .. ..@ i : int [1:69901] 1 2 4 9 11 12 16 22 23 26 ...
.. .. ..@ p : int [1:511] 0 112 215 384 566 668 746 880 1032 1163 ...
.. .. ..@ Dim : int [1:2] 307 510
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : chr [1:307] "Acta2" "Actb" "Ahi1" "Aldoa" ...
.. .. .. ..$ : chr [1:510] "P60HippoRep6P2_GACTTGAAATGC" "P60HippoRep2P2_ACTGAAGAATTN" "P60HippoRep2P2_CCTTAGCCCTGG" "P60HippoRep5P3_GTCAAGGCGGGC" ...
.. .. ..@ x : num [1:69901] 4 1 6 2 1 24 10 1 1 1 ...
.. .. ..@ factors : list()
..@ nUMI : Named int [1:510] 996 1199 2313 3929 884 586 1378 1768 1486 1331 ...
.. ..- attr(*, "names")= chr [1:510] "P60HippoRep6P2_GACTTGAAATGC" "P60HippoRep2P2_ACTGAAGAATTN" "P60HippoRep2P2_CCTTAGCCCTGG" "P60HippoRep5P3_GTCAAGGCGGGC" ...
最后,运行RCTD
myRCTD <- create.RCTD(puck, reference, max_cores = 2)
myRCTD <- run.RCTD(myRCTD, doublet_mode = 'full')
saveRDS(myRCTD,file.path(savedir,'myRCTD_visium_full.rds'))
看看myRCTD的结构:
str(myRCTD)
Formal class 'RCTD' [package "spacexr"] with 9 slots
..@ spatialRNA :Formal class 'SpatialRNA' [package "spacexr"] with 3 slots
.. .. ..@ coords:'data.frame': 313 obs. of 2 variables:
.. .. .. ..$ x: int [1:313] 4125 4125 4604 3047 4125 3047 4005 4125 3646 4484 ...
.. .. .. ..$ y: int [1:313] 5477 3000 4101 3895 6028 4858 5684 3964 5752 3069 ...
.. .. ..@ counts:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. .. .. .. ..@ i : int [1:37939] 0 1 2 3 4 5 6 7 8 9 ...
.. .. .. .. ..@ p : int [1:314] 0 122 244 366 488 610 730 852 974 1096 ...
.. .. .. .. ..@ Dim : int [1:2] 122 313
.. .. .. .. ..@ Dimnames:List of 2
.. .. .. .. .. ..$ : chr [1:122] "Aldoc" "Apoe" "Atp1a2" "Atp5f1" ...
.. .. .. .. .. ..$ : chr [1:313] "AAAGGGATGTAGCAAG-1" "AAAGGGCAGCTTGAAT-1" "AACAACTGGTAGTTGC-1" "AACCCAGAGACGGAGA-1" ...
.. .. .. .. ..@ x : num [1:37939] 10 18 8 7 6 25 7 25 50 4 ...
.. .. .. .. ..@ factors : list()
.. .. ..@ nUMI : Named num [1:313] 4818 13002 7156 5612 5844 ...
.. .. .. ..- attr(*, "names")= chr [1:313] "AAAGGGATGTAGCAAG-1" "AAAGGGCAGCTTGAAT-1" "AACAACTGGTAGTTGC-1" "AACCCAGAGACGGAGA-1" ...
..@ originalSpatialRNA:Formal class 'SpatialRNA' [package "spacexr"] with 3 slots
.. .. ..@ coords:'data.frame': 313 obs. of 2 variables:
.. .. .. ..$ x: int [1:313] 4125 4125 4604 3047 4125 3047 4005 4125 3646 4484 ...
.. .. .. ..$ y: int [1:313] 5477 3000 4101 3895 6028 4858 5684 3964 5752 3069 ...
.. .. ..@ counts:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. .. .. .. ..@ i : int [1:85720] 0 1 2 3 4 5 6 7 8 9 ...
.. .. .. .. ..@ p : int [1:314] 0 275 551 827 1101 1377 1650 1926 2202 2478 ...
.. .. .. .. ..@ Dim : int [1:2] 276 313
.. .. .. .. ..@ Dimnames:List of 2
.. .. .. .. .. ..$ : chr [1:276] "Rpl7" "Cox5b" "Rpl31" "Rpl37a" ...
.. .. .. .. .. ..$ : chr [1:313] "AAAGGGATGTAGCAAG-1" "AAAGGGCAGCTTGAAT-1" "AACAACTGGTAGTTGC-1" "AACCCAGAGACGGAGA-1" ...
.. .. .. .. ..@ x : num [1:85720] 8 17 8 11 7 8 6 4 9 15 ...
.. .. .. .. ..@ factors : list()
.. .. ..@ nUMI : Named num [1:313] 4818 13002 7156 5612 5844 ...
.. .. .. ..- attr(*, "names")= chr [1:313] "AAAGGGATGTAGCAAG-1" "AAAGGGCAGCTTGAAT-1" "AACAACTGGTAGTTGC-1" "AACCCAGAGACGGAGA-1" ...
..@ reference :Formal class 'Reference' [package "spacexr"] with 3 slots
.. .. ..@ cell_types: Factor w/ 17 levels "Astrocyte","CA1",..: 1 1 1 1 1 2 2 2 2 2 ...
.. .. .. ..- attr(*, "names")= chr [1:85] "P60HippoRep3P1_ATTATACTTGCG" "P60HippoRep1P2_TCACCCTCAAGC" "P60HippoRep2P2_CCTTAGCCCTGG" "P60HippoRep4P3_CTCCCCTTAATC" ...
.. .. ..@ counts :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. .. .. .. ..@ i : int [1:12442] 1 2 4 10 12 13 16 21 23 25 ...
.. .. .. .. ..@ p : int [1:86] 0 95 262 431 653 756 905 1097 1298 1551 ...
.. .. .. .. ..@ Dim : int [1:2] 307 85
.. .. .. .. ..@ Dimnames:List of 2
.. .. .. .. .. ..$ : chr [1:307] "Acta2" "Actb" "Ahi1" "Aldoa" ...
.. .. .. .. .. ..$ : chr [1:85] "P60HippoRep3P1_ATTATACTTGCG" "P60HippoRep1P2_TCACCCTCAAGC" "P60HippoRep2P2_CCTTAGCCCTGG" "P60HippoRep4P3_CTCCCCTTAATC" ...
.. .. .. .. ..@ x : num [1:12442] 2 1 10 2 16 1 11 1 2 1 ...
.. .. .. .. ..@ factors : list()
.. .. ..@ nUMI : Named int [1:85] 755 1936 2313 3614 1199 1414 2512 4002 8253 5677 ...
.. .. .. ..- attr(*, "names")= chr [1:85] "P60HippoRep3P1_ATTATACTTGCG" "P60HippoRep1P2_TCACCCTCAAGC" "P60HippoRep2P2_CCTTAGCCCTGG" "P60HippoRep4P3_CTCCCCTTAATC" ...
..@ config :List of 22
.. ..$ gene_cutoff : num 0.000125
.. ..$ fc_cutoff : num 0.5
.. ..$ gene_cutoff_reg : num 2e-04
.. ..$ fc_cutoff_reg : num 0.75
.. ..$ UMI_min : num 100
.. ..$ UMI_min_sigma : num 300
.. ..$ max_cores : num 2
.. ..$ N_epoch : num 8
.. ..$ N_X : num 50000
.. ..$ K_val : num 100
.. ..$ N_fit : num 1000
.. ..$ N_epoch_bulk : num 30
.. ..$ MIN_CHANGE_BULK : num 1e-04
.. ..$ MIN_CHANGE_REG : num 0.001
.. ..$ UMI_max : num 2e+07
.. ..$ counts_MIN : num 10
.. ..$ MIN_OBS : num 3
.. ..$ MAX_MULTI_TYPES : num 4
.. ..$ CONFIDENCE_THRESHOLD: num 10
.. ..$ DOUBLET_THRESHOLD : num 25
.. ..$ RCTDmode : chr "full"
.. ..$ doublet_mode : chr "full"
..@ cell_type_info :List of 2
.. ..$ info :List of 3
.. .. ..$ :'data.frame': 307 obs. of 17 variables:
.. .. .. ..$ Astrocyte : num [1:307] 2.31e-05 2.53e-03 2.09e-04 7.12e-04 8.77e-03 ...
.. .. .. ..$ CA1 : num [1:307] 0.00 2.85e-03 9.55e-04 1.08e-03 8.91e-05 ...
.. .. .. ..$ CA3 : num [1:307] 0 0.001891 0.000362 0.001063 0.0001 ...
.. .. .. ..$ Cajal_Retzius : num [1:307] 0.00 2.51e-03 7.78e-04 1.25e-03 3.78e-05 ...
.. .. .. ..$ Choroid : num [1:307] 1.96e-06 1.53e-03 7.45e-05 9.64e-04 3.55e-05 ...
.. .. .. ..$ Denate : num [1:307] 0.00 3.95e-03 1.06e-03 1.43e-03 9.47e-05 ...
.. .. .. ..$ Endothelial_Stalk : num [1:307] 0.00 6.15e-03 7.55e-05 2.48e-04 2.73e-04 ...
.. .. .. ..$ Endothelial_Tip : num [1:307] 0 0.003204 0.000312 0.000223 0.000186 ...
.. .. .. ..$ Entorihinal : num [1:307] 0 0.001853 0.001246 0.001098 0.000276 ...
.. .. .. ..$ Ependymal : num [1:307] 0.000925 0.003346 0.000247 0.00041 0.000506 ...
.. .. .. ..$ Interneuron : num [1:307] 0 0.001779 0.001251 0.001012 0.000114 ...
.. .. .. ..$ Microglia_Macrophages: num [1:307] 0 0.008027 0.000361 0.000502 0.000234 ...
.. .. .. ..$ Mural : num [1:307] 0.010468 0.009888 0.000145 0.001087 0.000281 ...
.. .. .. ..$ Neurogenesis : num [1:307] 0 0.007595 0.000314 0.000104 0.000133 ...
.. .. .. ..$ Neuron.Slc17a6 : num [1:307] 3.99e-05 2.01e-03 1.54e-03 1.60e-03 1.12e-04 ...
.. .. .. ..$ Oligodendrocyte : num [1:307] 0.00 5.35e-03 5.36e-05 6.72e-04 4.75e-04 ...
.. .. .. ..$ Polydendrocyte : num [1:307] 0 0.004245 0.00023 0.000265 0.000367 ...
.. .. ..$ : chr [1:17] "Astrocyte" "CA1" "CA3" "Cajal_Retzius" ...
.. .. ..$ : int 17
.. ..$ renorm:List of 3
.. .. ..$ :'data.frame': 122 obs. of 17 variables:
.. .. .. ..$ Astrocyte : num [1:122] 0.03946 0.06151 0.02067 0.00247 0.00384 ...
.. .. .. ..$ CA1 : num [1:122] 0.000401 0.000483 0.000393 0.000771 0.000516 ...
.. .. .. ..$ CA3 : num [1:122] 4.52e-04 2.87e-04 6.84e-05 7.67e-04 1.66e-04 ...
.. .. .. ..$ Cajal_Retzius : num [1:122] 0.00017 0.000304 0.000933 0.0019 0.001271 ...
.. .. .. ..$ Choroid : num [1:122] 0.00016 0.01603 0.00105 0.00276 0.00112 ...
.. .. .. ..$ Denate : num [1:122] 0.000426 0.00125 0.000161 0.001855 0.000682 ...
.. .. .. ..$ Endothelial_Stalk : num [1:122] 0.00123 0.005732 0.000646 0.001426 0.001425 ...
.. .. .. ..$ Endothelial_Tip : num [1:122] 0.000838 0.028824 0.016866 0.000922 0.002654 ...
.. .. .. ..$ Entorihinal : num [1:122] 1.24e-03 3.07e-04 7.69e-05 1.43e-03 5.16e-04 ...
.. .. .. ..$ Ependymal : num [1:122] 0.00228 0.02773 0.00143 0.00101 0.00151 ...
.. .. .. ..$ Interneuron : num [1:122] 0.000515 0.00066 0.000194 0.001144 0.000779 ...
.. .. .. ..$ Microglia_Macrophages: num [1:122] 0.001052 0.012869 0.000131 0.001145 0.003646 ...
.. .. .. ..$ Mural : num [1:122] 0.00126 0.00283 0.0084 0.0015 0.00267 ...
.. .. .. ..$ Neurogenesis : num [1:122] 0.000601 0.001941 0.000344 0.001637 0.000988 ...
.. .. .. ..$ Neuron.Slc17a6 : num [1:122] 0.000505 0.000372 0.000115 0.001564 0.00067 ...
.. .. .. ..$ Oligodendrocyte : num [1:122] 0.002139 0.010166 0.000455 0.001387 0.003402 ...
.. .. .. ..$ Polydendrocyte : num [1:122] 0.00165 0.01303 0.00225 0.00153 0.00394 ...
.. .. ..$ : chr [1:17] "Astrocyte" "CA1" "CA3" "Cajal_Retzius" ...
.. .. ..$ : int 17
..@ internal_vars :List of 8
.. ..$ gene_list_reg : chr [1:102] "Aldoc" "Apoe" "Atp1a2" "Cd81" ...
.. ..$ gene_list_bulk : chr [1:122] "Aldoc" "Apoe" "Atp1a2" "Atp5f1" ...
.. ..$ proportions : Named num [1:17] 1.21e-01 2.54e-07 2.60e-01 5.61e-01 9.39e-03 ...
.. .. ..- attr(*, "names")= chr [1:17] "Astrocyte" "CA1" "CA3" "Cajal_Retzius" ...
.. ..$ class_df :'data.frame': 17 obs. of 1 variable:
.. .. ..$ class: chr [1:17] "Astrocyte" "CA1" "CA3" "Cajal_Retzius" ...
.. ..$ cell_types_assigned: logi TRUE
.. ..$ sigma : num 0.21
.. ..$ Q_mat : num [1:103, 1:2536] 1.00 1.43e-05 1.60e-06 9.72e-07 6.88e-07 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:103] "V1" "V2" "V3" "V4" ...
.. .. .. ..$ : NULL
.. ..$ X_vals : num [1:2536] 1.00e-05 2.83e-05 5.20e-05 8.00e-05 1.12e-04 ...
..@ results :List of 1
.. ..$ weights:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. .. .. ..@ i : int [1:5321] 0 1 2 3 4 5 6 7 8 9 ...
.. .. .. ..@ p : int [1:18] 0 313 626 939 1252 1565 1878 2191 2504 2817 ...
.. .. .. ..@ Dim : int [1:2] 313 17
.. .. .. ..@ Dimnames:List of 2
.. .. .. .. ..$ : chr [1:313] "AAAGGGATGTAGCAAG-1" "AAAGGGCAGCTTGAAT-1" "AACAACTGGTAGTTGC-1" "AACCCAGAGACGGAGA-1" ...
.. .. .. .. ..$ : chr [1:17] "Astrocyte" "CA1" "CA3" "Cajal_Retzius" ...
.. .. .. ..@ x : num [1:5321] 0.0436 0.0113 0.0473 0.0394 0.0574 ...
.. .. .. ..@ factors : list()
..@ de_results : list()
..@ internal_vars_de : list()
探索一下RCTD结果
RCTD full mode的结果保存在@results$weights中。接下来使用normalize_weights函数标化这个权重,使得每个spots中的各种细胞类型比例加起来等于1。
## result
myRCTD <- readRDS(file.path(savedir,'myRCTD_visium_full.rds'))
barcodes <- colnames(myRCTD@spatialRNA@counts)
weights <- myRCTD@results$weights
norm_weights <- normalize_weights(weights)
看一下norm_weights:每一行代表一个spots,每一列代表一个细胞类型,值表示这个spot中细胞类型所占的比例。每一行的和为1。
# observe weight values
cell_types <- c('Denate', 'Neurogenesis','Cajal_Retzius')
print(head(norm_weights[,cell_types]))
head(norm_weights)
6 x 17 Matrix of class "dgeMatrix"
Astrocyte CA1 CA3 Cajal_Retzius
AAAGGGATGTAGCAAG-1 0.05574455 1.749977e-04 1.749977e-04 0.06513845
AAAGGGCAGCTTGAAT-1 0.01127636 2.001086e-01 2.329982e-03 0.22292711
AACAACTGGTAGTTGC-1 0.06214291 6.172927e-05 6.172927e-05 0.07028278
AACCCAGAGACGGAGA-1 0.05533508 1.923783e-04 1.923783e-04 0.05714832
AACCGAGCTTGGTCAT-1 0.06701348 1.598240e-04 1.598240e-04 0.09502877
AACGATAGAAGGGCCG-1 0.04751156 7.180834e-07 7.180834e-07 0.01287445
有了这个结果,再加上空间对象的rds文件,可以借助其他反卷积的绘图函数绘制出其他反卷积结果中一样的那种饼图了,比如SPOTlight软件可以绘制的饼图:
教程中可以展示某一个细胞类型在空间数据中反卷积出来的比例可视化:
总共有17中细胞类型,这里展示Denate
# plot Dentate weights
p <- plot_puck_continuous(myRCTD@spatialRNA, barcodes, norm_weights[,'Denate'], ylimit = c(0,0.5),
title ='plot of Dentate weights', size=4.5, alpha=0.8)
ggsave(paste0(savedir, "/Spaital_weights.png"), width=8, height=6, plot=p,bg="white")
结果图:
借用STdeconvolve绘制一下饼图看看:
library(STdeconvolve)
library(ggplot2)
library(ggsci)
packageVersion("STdeconvolve")
m <- as.matrix(norm_weights)
p <- coords
plt <- vizAllTopics(theta = m,
pos = p,
topicOrder=seq(ncol(m)),
topicCols=rainbow(ncol(m)),
groups = NA,
group_cols = NA,
r = 56, # size of scatterpies; adjust depending on the coordinates of the pixels
lwd = 0.3,
showLegend = TRUE,
plotTitle = "scatterpies")
## function returns a `ggplot2` object, so other aesthetics can be added on:
plt <- plt + ggplot2::guides(fill=ggplot2::guide_legend(ncol=2))
ggsave(paste0(savedir, "/Spaital_scatterpies.png"), width=12, height=6, plot=plt, bg="white")
结果图:
这个反卷积结果可以看到,某些区域细胞类型占比比较大,具有区域特征。绘图的时候还有个小技巧:同种类型的细胞的不同亚型,可以设置同种色系,可以方便更清晰的看出来区域特征。这里就不展示了,毕竟还要去找色系来对应。
这个教程后半段还可以探索不同region之间的差异表达基因。