0.输入数据和R包
在生信星球公众号聊天框回复“forest779”即可获得输入数据。也可以自己根据表达矩阵与临床信息生成,如下:
rm(list = ls())
load("cox_dat.Rdata")
str(dat)
## 'data.frame': 516 obs. of 10 variables:
## $ time : num 47.867 0.533 39.7 24.5 49.767 ...
## $ event : num 0 0 1 1 0 0 0 0 0 0 ...
## $ gender: Factor w/ 2 levels "female","male": 2 1 2 2 2 2 2 1 2 2 ...
## $ stage : Factor w/ 4 levels "i","ii","iii",..: 3 3 1 1 2 2 1 1 1 1 ...
## $ age : num 66 77 57 59 57 67 70 52 51 53 ...
## $ miR21 : num 16.7 18 18 17.8 18.4 ...
## $ miR143: num 17 18 17.1 16.5 16.8 ...
## $ miR10b: num 17.7 17.9 17.5 17.1 17.1 ...
## $ miR192: num 15.3 11.6 14.4 14.2 14.6 ...
## $ miR183: num 8.44 11.24 10.53 10.03 8.43 ...
library(survival)
library(survminer)
library(forestplot)
library(stringr)
1.多因素cox回归
建模就一句代码,出森林图也是一句代码。
model <- coxph(Surv(time, event) ~., data = dat )
ggforest(model)
这个虽然是ggplot2对象,但是灰色背景去不掉,也没办法用常规的ggplot2语法去修改颜色,只能是导出去再慢慢调咯。
2.美化版森林图-forestplot
用到一个新的R包,forestplot。
它就没有ggforest那么智能了,森林图展示的内容是需要自己组织的。
2.1学学帮助文档
看看函数的帮助文档,最简单的用法:
row_names <- list(list("test = 1", expression(test >= 2)))
test_data <- data.frame(
coef = c(1.59, 1.24),
low = c(1.4, 0.78),
high = c(1.8, 1.55)
)
forestplot(row_names,
test_data$coef,
test_data$low,
test_data$high,
zero = 1,
cex = 2,
lineheight = "auto",
xlab = "Lab axis txt"
)
最核心的信息就是HR值和它的置信区间范围,我们可以从cox模型中提取到图上的这些信息。
2.2组织输入数据
照葫芦画瓢开始,准备添加在图上的label列。
m = summary(model)
colnames(m$coefficients)
## [1] "coef" "exp(coef)" "se(coef)" "z" "Pr(>|z|)"
colnames(m$conf.int)
## [1] "exp(coef)" "exp(-coef)" "lower .95" "upper .95"
#p值改一下格式,加上显著性
p = ifelse(
m$coefficients[, 5] < 0.001,
"<0.001 ***",
ifelse(
m$coefficients[, 5] < 0.01,
"<0.01 **",
ifelse(
m$coefficients[, 5] < 0.05,
paste(round(m$coefficients[, 5], 3), " *"),
round(m$coefficients[, 5], 3)
)
)
)
#HR和它的置信区间
dat2 = as.data.frame(round(m$conf.int[, c(1, 3, 4)], 2))
dat2 = tibble::rownames_to_column(dat2, var = "Trait")
colnames(dat2)[2:4] = c("HR", "lower", "upper")
#需要在图上显示的HR文字和p值
dat2$HR2 = paste0(dat2[, 2], "(", dat2[, 3], "-", dat2[, 4], ")")
dat2$p = p
str(dat2)
## 'data.frame': 10 obs. of 6 variables:
## $ Trait: chr "gendermale" "stageii" "stageiii" "stageiv" ...
## $ HR : num 1.08 1.22 2.5 6.39 1.03 1.41 0.86 0.89 0.82 0.95
## $ lower: num 0.76 0.61 1.6 4.12 1.01 1.14 0.74 0.75 0.76 0.85
## $ upper: num 1.53 2.46 3.89 9.9 1.05 1.75 0.99 1.05 0.9 1.06
## $ HR2 : chr "1.08(0.76-1.53)" "1.22(0.61-2.46)" "2.5(1.6-3.89)" "6.39(4.12-9.9)" ...
## $ p : chr "0.668" "0.574" "<0.001 ***" "<0.001 ***" ...
2.3基础画图
forestplot(
dat2[, c(1, 4, 6)],
mean = dat2[, 2],
lower = dat2[, 3],
upper = dat2[, 4],
zero = 1,
boxsize = 0.4,
col = fpColors(box = '#1075BB', lines = 'black', zero = 'grey'),
lty.ci = "solid",
graph.pos = 2
)
2.4加点细节
主要是分类向量,如这里的性别和分期,需要画出reference,那就需要在输入数据dat2里添加几行。
dat2$Trait = str_remove(dat2$Trait, "gender|stage")
ins = function(x) {
c(x, rep(NA, ncol(dat2) - 1))
}
dat2 = rbind(
c("Trait", NA, NA, NA, "HR", "p"),
ins("gender"),
ins("female"),
dat2[1, ],
ins("stage"),
ins("i"),
dat2[2:nrow(dat2), ]
)
for(i in 2:4) {
dat2[, i] = as.numeric(dat2[, i])
}
str(dat2)
## 'data.frame': 15 obs. of 6 variables:
## $ Trait: chr "Trait" "gender" "female" "male" ...
## $ HR : num NA NA NA 1.08 NA NA 1.22 2.5 6.39 1.03 ...
## $ lower: num NA NA NA 0.76 NA NA 0.61 1.6 4.12 1.01 ...
## $ upper: num NA NA NA 1.53 NA NA 2.46 3.89 9.9 1.05 ...
## $ HR2 : chr "HR" NA NA "1.08(0.76-1.53)" ...
## $ p : chr "p" NA NA "0.668" ...
重新画图
forestplot(
dat2[, c(1, 5, 6)],
mean = dat2[, 2],
lower = dat2[, 3],
upper = dat2[, 4],
zero = 1,
boxsize = 0.4,
col = fpColors(box = '#1075BB', lines = 'black', zero = 'grey'),
lty.ci = "solid",
graph.pos = 2,
#xticks = F,
is.summary = c(T, T, F, F, T, rep(F, 10)),
align = "l",
hrzl_lines = list(
"1" = gpar(lty=1),
"2" = gpar(lty=1),
"16"= gpar(lty=1)),
colgap = unit(5, 'mm')
)
差不多啦!可以。