在仙桃学术上的生信工具里面,有一个折线图的绘图工具,可以很快速便捷的得出结论并可视化结果,当然不是说这个功能有多强大,而是统计学方法非常专业。
比如用它自带的数据https://bioinfomatics.xiantao.love/biotools/data/demo/free/linePlot/%E6%8A%98%E7%BA%BF%E5%9B%BE.xlsx
通过无脑式的鼠标点击,可得到下面一系列的结果。
可以看到有很多结果,包括各种统计学描述、各种格式的图片,以及一个demo.R
比如统计描述
组别1 | 组别2 | 数目 | 最小值 | 最大值 | 中位数(Median) | 四分位距(IQR) | 下四分位 | 上四分位 | 均值(Mean) | 标准差(SD) | 标准误(SE) |
---|---|---|---|---|---|---|---|---|---|---|---|
Time1 | Control | 12 | 2.833 | 3.306 | 2.967 | 0.254 | 2.885 | 3.139 | 3.020 | 0.154 | 0.044 |
Time1 | Intervene | 12 | 2.624 | 3.418 | 3.074 | 0.297 | 2.876 | 3.173 | 3.032 | 0.258 | 0.074 |
Time2 | Control | 12 | 2.733 | 3.240 | 2.930 | 0.187 | 2.861 | 3.047 | 2.957 | 0.148 | 0.043 |
Time2 | Intervene | 12 | 3.419 | 4.318 | 4.082 | 0.388 | 3.824 | 4.212 | 4.005 | 0.285 | 0.082 |
Time3 | Control | 12 | 2.611 | 3.414 | 2.938 | 0.392 | 2.805 | 3.197 | 2.979 | 0.246 | 0.071 |
Time3 | Intervene | 12 | 5.799 | 6.285 | 6.065 | 0.185 | 5.916 | 6.101 | 6.030 | 0.139 | 0.040 |
又比如异常值分析、正态性检验(Shapiro-Wilk normality test)、方差齐性检验(Levene's test)、协方差同质假设(Homogeneity of covariances assumption/Box’s M-test)、球形假设(Mauchly's Test for Sphericity)、两因素单向重复测量数据方差分析(Two-way mixed ANOVA)、单独效应分析(Simple effect)、多重成对比较(Pairwise Comparisons of Estimated Marginal Means)等等专业术语,并且还给了详细的统计学结果。
这种统计学结果让大家很欣慰,那么这些结果都是如何计算出来的呢???
我们可以点开demo.R这个结果,读一读代码
https://bioinfomatics.xiantao.love/biotools/code/open/lineplot.R
现在我们复现一下:
加载需要的包并导入数据
library(ggplot2)
library(reshape2)
library(car)
library(rstatix)
## 导入数据,比如我们把下载的折线图数据保存在桌面
library(readxl)
data <- read_excel("~/Desktop/折线图.xlsx") ## mac系统代码
data # 显示结果
trt | Time1 | Time2 | Time3 |
---|---|---|---|
Control | 3.185968 | 2.875802 | 3.414224 |
Control | 2.832632 | 2.862111 | 2.801333 |
Control | 3.123533 | 2.984142 | 2.909648 |
Control | 2.880927 | 3.146924 | 3.256431 |
Control | 3.090936 | 2.732620 | 2.966887 |
Control | 2.920949 | 2.865287 | 2.820161 |
Control | 3.306013 | 2.856390 | 2.806807 |
Control | 2.885842 | 3.002480 | 2.611145 |
Control | 2.955919 | 3.239596 | 3.182478 |
Control | 2.978572 | 2.818567 | 2.734901 |
Control | 3.190647 | 3.042065 | 3.001664 |
Control | 2.883867 | 3.063226 | 3.240626 |
Intervene | 2.623973 | 3.771982 | 6.041055 |
Intervene | 2.663926 | 3.841685 | 6.098856 |
Intervene | 3.045286 | 4.301342 | 6.108293 |
Intervene | 3.054712 | 4.065593 | 6.085687 |
Intervene | 3.127857 | 4.097633 | 6.055522 |
Intervene | 3.147686 | 4.134947 | 5.926182 |
Intervene | 3.248095 | 3.419161 | 5.855298 |
Intervene | 3.326086 | 4.182629 | 5.885426 |
Intervene | 3.093890 | 4.318014 | 6.151859 |
Intervene | 3.418347 | 4.303615 | 6.284906 |
Intervene | 2.935396 | 3.986879 | 5.798966 |
Intervene | 2.697790 | 3.640084 | 6.073529 |
新增一列id,id即为数字,用于后续分析,必不可少
data$id <- 1:nrow(data)
trt | Time1 | Time2 | Time3 | id |
---|---|---|---|---|
Control | 3.185968 | 2.875802 | 3.414224 | 1 |
Control | 2.832632 | 2.862111 | 2.801333 | 2 |
Control | 3.123533 | 2.984142 | 2.909648 | 3 |
Control | 2.880927 | 3.146924 | 3.256431 | 4 |
Control | 3.090936 | 2.732620 | 2.966887 | 5 |
Control | 2.920949 | 2.865287 | 2.820161 | 6 |
Control | 3.306013 | 2.856390 | 2.806807 | 7 |
Control | 2.885842 | 3.002480 | 2.611145 | 8 |
Control | 2.955919 | 3.239596 | 3.182478 | 9 |
Control | 2.978572 | 2.818567 | 2.734901 | 10 |
Control | 3.190647 | 3.042065 | 3.001664 | 11 |
Control | 2.883867 | 3.063226 | 3.240626 | 12 |
Intervene | 2.623973 | 3.771982 | 6.041055 | 13 |
Intervene | 2.663926 | 3.841685 | 6.098856 | 14 |
Intervene | 3.045286 | 4.301342 | 6.108293 | 15 |
Intervene | 3.054712 | 4.065593 | 6.085687 | 16 |
Intervene | 3.127857 | 4.097633 | 6.055522 | 17 |
Intervene | 3.147686 | 4.134947 | 5.926182 | 18 |
Intervene | 3.248095 | 3.419161 | 5.855298 | 19 |
Intervene | 3.326086 | 4.182629 | 5.885426 | 20 |
Intervene | 3.093890 | 4.318014 | 6.151859 | 21 |
Intervene | 3.418347 | 4.303615 | 6.284906 | 22 |
Intervene | 2.935396 | 3.986879 | 5.798966 | 23 |
Intervene | 2.697790 | 3.640084 | 6.073529 | 24 |
将短数据转换为长数据
data2 <- gather(data, key = "x", value = "value", -trt, -id)
对各组进行统计学描述
data3 <- data2 %>%
group_by(trt, x) %>%
get_summary_stats(value)
trt | x | variable | n | min | max | median | q1 | q3 | iqr | mad | mean | sd | se | ci |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Control | Time1 | value | 12 | 2.833 | 3.306 | 2.967 | 2.885 | 3.139 | 0.254 | 0.156 | 3.020 | 0.154 | 0.044 | 0.098 |
Control | Time2 | value | 12 | 2.733 | 3.240 | 2.930 | 2.861 | 3.047 | 0.187 | 0.137 | 2.957 | 0.148 | 0.043 | 0.094 |
Control | Time3 | value | 12 | 2.611 | 3.414 | 2.938 | 2.805 | 3.197 | 0.392 | 0.252 | 2.979 | 0.246 | 0.071 | 0.156 |
Intervene | Time1 | value | 12 | 2.624 | 3.418 | 3.074 | 2.876 | 3.173 | 0.297 | 0.232 | 3.032 | 0.258 | 0.074 | 0.164 |
Intervene | Time2 | value | 12 | 3.419 | 4.318 | 4.082 | 3.824 | 4.212 | 0.388 | 0.327 | 4.005 | 0.285 | 0.082 | 0.181 |
Intervene | Time3 | value | 12 | 5.799 | 6.285 | 6.065 | 5.916 | 6.101 | 0.185 | 0.097 | 6.030 | 0.139 | 0.040 | 0.088 |
批量运行正态性检验(Shapiro-Wilk normality test)
for(i in unique(data[,1])){
data1 <- data[data[,1] == i,]
print(lapply(data1[,-1], function(x) shapiro.test(x)))
}
$Time1
Shapiro-Wilk normality test
data: x
W = 0.91913, p-value = 0.2788
$Time2
Shapiro-Wilk normality test
data: x
W = 0.88011, p-value = 0.08792
$Time3
Shapiro-Wilk normality test
data: x
W = 0.73897, p-value = 0.002055
$id
Shapiro-Wilk normality test
data: x
W = 0.95933, p-value = 0.7742
批量运行方差齐性检验(Levene's Test)
for(i in unique(data2$x)){
data1 <- data2[data2$x == i,]
print(leveneTest(value~trt, data = data1))
}
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 1.5617 0.2245
22
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 2.5864 0.122
22
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 3.8375 0.06291 .
22
Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
两因素重复测量方差分析
anova_test(data = data2, dv = value, wid = id, within = x, between = trt)
ANOVA Table (type II tests)
$ANOVA
Effect DFn DFd F p p<.05 ges
1 trt 1 22 510.657 1.02e-16 * 0.918
2 x 2 44 391.826 9.19e-29 * 0.902
3 trt:x 2 44 407.706 4.01e-29 * 0.905$
Mauchly's Test for Sphericity
Effect W p p<.05
1 x 0.966 0.699
2 trt:x 0.966 0.699$
Sphericity Corrections
Effect GGe DF[GG] p[GG] p[GG]<.05 HFe DF[HF] p[HF] p[HF]<.05
1 x 0.968 1.94, 42.57 6.65e-28 * 1.059 2.12, 46.61 9.19e-29 *
2 trt:x 0.968 1.94, 42.57 2.98e-28 * 1.059 2.12, 46.61 4.01e-29 *
可视化结果
ggplot(data = data3, aes(x = x, y = mean, color = trt, group = trt)) +
geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), color = "black", width = 0.2) +
geom_line() +
geom_point() +
theme_bw()
当然这个图跟原图不一样,并且没有两两比较,我们可以使用ggpubr和rstatix进行二次绘图。
关于rstatix进行统计学分析,其实可以Repeated Measures ANOVA in R: The Ultimate Guide - Datanovia
得到答案,具体的分析,我们暂不说了,只说如何进行统计学分析,并且自动两两比较
res.aov <- anova_test(data = data2, dv = value, wid = id, within = x, between = trt)
# 组间两两比较
pwc <- data2 %>%
group_by(x) %>%
pairwise_t_test(
value ~ trt, paired = TRUE,
p.adjust.method = "bonferroni" ## 校正方法,包括 "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none".等
)
pwc
x | .y. | group1 | group2 | n1 | n2 | statistic | df | p | p.adj | p.adj.signif |
---|---|---|---|---|---|---|---|---|---|---|
Time1 | value | Control | Intervene | 12 | 12 | -0.1453235 | 11 | 8.87e-01 | 8.87e-01 | ns |
Time2 | value | Control | Intervene | 12 | 12 | -12.3908123 | 11 | 1.00e-07 | 1.00e-07 | **** |
Time3 | value | Control | Intervene | 12 | 12 | -40.9352857 | 11 | 0.00e+00 | 0.00e+00 | **** |
可以看到已经自动进行了两两比较,我们使用rstatix包的add_xy_position()
函数可以添加两两比较列表和x轴y轴位置。
pwc <- pwc %>% add_xy_position(x = "x")# 按时间分列
pwc
x | .y. | group1 | group2 | n1 | n2 | statistic | df | p | p.adj | p.adj.signif | y.position | groups | xmin | xmax |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | value | Control | Intervene | 12 | 12 | -0.1453235 | 11 | 8.87e-01 | 8.87e-01 | ns | 3.7834 | Control , Intervene | 0.8 | 1.2 |
2 | value | Control | Intervene | 12 | 12 | -12.3908123 | 11 | 1.00e-07 | 1.00e-07 | **** | 4.6834 | Control , Intervene | 1.8 | 2.2 |
3 | value | Control | Intervene | 12 | 12 | -40.9352857 | 11 | 0.00e+00 | 0.00e+00 | **** | 6.6504 | Control , Intervene | 2.8 | 3.2 |
可视化绘图
library(ggpubr)
ggline(data2,x='x',y='value',
color = 'trt', #按组配色
add = 'mean_sd', #添加均数标准差,也可以设置均数标准误,CI等。
palette = "aaas" # aaas杂志配色
)+stat_pvalue_manual(pwc, tip.length = 0, hide.ns = TRUE) + # 添加两两比较,隐藏无意义
labs(
subtitle = get_test_label(res.aov, detailed = TRUE), # 添加整体差异
caption = get_pwc_label(pwc) # 右下角条件两两比较方法。
)
当然,我们还可以继续修改图片,比如全部显示结果,取消显著性标记的下划线,或者显示具体的p值等
ggline(data2,x='x',y='value',
color = 'trt', #按组配色
add = 'mean_sd', #添加均数标准差,也可以设置均数标准误,CI等。
palette = "aaas",# aaas杂志配色
ggtheme = theme_bw(), #背景
xlab = F,ylab = "Score", #xy轴名称
legend.title="Treatment" ,legend="top", #标签名称和位置
title = "Comparison of two groups at different times" #标题
)+stat_pvalue_manual(pwc, bracket.size = 0) + # 添加两两比较,显示所有结果
labs(
subtitle = get_test_label(res.aov, detailed = TRUE), # 添加整体差异
caption = get_pwc_label(pwc) # 右下角显示事后两两比较方法。
)
也可以显示数值
ggline(data2,x='x',y='value',
color = 'trt', #按组配色
add = 'mean_sd', #添加均数标准差,也可以设置均数标准误,CI等。
palette = "aaas",# aaas杂志配色#背景
xlab = F,ylab = "Score", #xy轴名称
legend.title="Treatment" ,legend="right",
title = "Comparison of two groups at different times" #标题
)+stat_pvalue_manual(pwc, bracket.size = 0,label = "p.adj") + # 添加两两比较,显示所有结果
labs(
subtitle = get_test_label(res.aov, detailed = TRUE), # 添加整体差异
caption = get_pwc_label(pwc) # 右下角显示事后两两比较方法。
)