R语言绘制组间比较散点图并自动添加P值信息

查询ggprism包使用时候发现官网给出的一示例图比较常用，这里记录学习一下。

image-20221208130405959

加载R包准备数据

## 加载R包
sapply(c('dplyr',"ggplot2","ggprism",
         "ggbeeswarm","rstatix"), require, character.only = TRUE)

## 准备数据
data("wings")
head(wings)

## 整理数据
wings$measure <- wings$measure %>% 
  gsub("\\.", " ", .) %>% 
  tools::toTitleCase() %>% 
  factor(., levels = c("Wing Size", "Cell Size", "Cell Number"))

head(wings)
# A tibble: 6 × 4
  sex   genotype  measure     percent.change
  <fct> <fct>     <fct>                <dbl>
1 male  Tps1MIC/+ Wing Size            -2.45
2 male  Tps1MIC/+ Cell Size             6.33
3 male  Tps1MIC/+ Cell Number          -8.41
4 male  Tps1MIC/+ Wing Size            -1.14
5 male  Tps1MIC/+ Cell Size            -2.53
6 male  Tps1MIC/+ Cell Number           1.26


## 基本绘图
p <- ggplot(wings, aes(x = measure, y = percent.change))+
  ggbeeswarm::geom_beeswarm(
    aes(fill = genotype), 
    dodge.width = 0.9, 
    shape = 21,
    cex = 3.5
  )
p


## 分面
p <- p+facet_wrap(
  ~sex,
  scales = 'free',
  labeller = labeller(sex = c(male = "\u2642", female = "\u2640"))
)+
  geom_hline(yintercept = 0, linetype = 2, linewidth = 0.3)

p

image-20221222145913758

注意两个细节：

tools::toTitleCase可以替换字符中每个单词的首字母大写，str_to_title函数也可类似这种效果

TotitleCase函数源码如下：
ToTitleCase <- function(x) {
  paste0(toupper(substr(x, 1, 1)), substr(x, 2, nchar(x)))
}

facet_wrap中的labeller属性对不同的因子类型设置不同的标记，例子中\u2642就是使用了unicode字符标记，推荐个Unicode Character Table便于查询

image-20221222150440938

添加均值


p <- p + stat_summary(
  geom = "crossbar",
  aes(fill = genotype),
  fun = mean,
  position = position_dodge(0.9),
  colour = "red",
  linewidth = 0.4, width = 0.7,
  show.legend = FALSE
)
p

image-20221222150633975

计算显著性P值

wings_pvals <- wings %>%
  group_by(sex, measure) %>%
  rstatix::t_test(
    percent.change ~ genotype, 
    p.adjust.method = "BH", 
    var.equal = TRUE, 
    ref.group = "Tps1MIC/+"
  ) %>%
  rstatix::add_x_position(x = "measure", dodge = 0.9) %>% # dodge must match points
  mutate(label = c("***", "*", "P = 0.26", "***", "***", "P = 0.65"))
  
  
p <- p + add_pvalue(
  wings_pvals, y = 10, xmin = "xmin", xmax = "xmax", tip.length = 0, 
  fontface = "italic", lineend = "round", bracket.size = 0.5
)
p

image-20221222150718356

这里主要利用rstatix包中的t_test函数批量计算P值，使用add_x_position自动计算P值的位置,信息最后通过add_pvalue函数根据位置信息自动标记于图中。

image-20221222151131661

设置主题及配色

## 添加主题元素
p <- p + theme_prism(
  base_fontface = "plain", 
  base_line_size = 0.7, 
  base_family = "Arial") + 
  scale_x_discrete(guide = guide_prism_bracket(width = 0.15), 
  labels = scales::wrap_format(5))+
  scale_y_continuous(
          limits = c(-20, 12),
          expand = c(0, 0),
          breaks = seq(-20, 10, 5),
          guide = "prism_offset") + 
  labs(y = "% change")  + 
  theme(
    legend.position = "bottom",
    axis.title.x = element_blank(),
    strip.text = element_text(size = 14),
    legend.spacing.x = unit(0, "pt"),
    legend.text = element_text(margin = margin(r = 20))
   ) +
  guides(fill = guide_legend(override.aes = list(size = 3)))
p

 
## 改变颜色及图例文本格式
p <- p + scale_fill_manual(
  values = c("#026FEE", "#87FFFF"), 
  labels = c(expression("Tps"*1^italic("MIC")~"/ +"), 
             expression("Tps"*1^italic("MIC")))
)
p

## 添加文本注释
p <- p + geom_text(
  data = data.frame(
    sex = factor("female", levels = c("male", "female")), 
    measure = "Cell Number", 
    percent.change = -18.5, 
    lab = "(n = 10)"
  ), 
  aes(label = lab)
)
p

image-20221222151215683

虽说是一个简单的示例图, 但其中有很多细节调整值得去学习，而且这种类型的图论文中也是比较常用的。

R语言绘制组间比较散点图并自动添加P值信息

加载R包准备数据

添加均值

计算显著性P值

设置主题及配色

推荐阅读更多精彩内容