下面是paper中的一个图,从图中可以看出,其实也是小提琴图,只不过是两个半面的小提琴图放在了一起。
我们今天就来测试它的画法。
我们从文中下载的所有基因在不同样品中的表达值。
library(ggplot2)
library(gghalves)
library(tidyverse)
data <- read.csv("data.csv",header=T)
从中随机选取10个基因来画图。
data <- data[sample(1:nrow(data), 10),]
还是以前的做法,长矩阵转化为短矩阵。
data_new <- melt(data,id="gene")
换了一下列名字
colnames(data_new) <- c("Genes","Samples","Values")
利用样品的ID来添加了样品的分组。
data_new$group <- str_split(data_new$Samples, "_",simplify = T)[,4]
ggplot(data_new, aes(x = Genes, y = Values, fill = group)) +
geom_violin(position = position_dodge(width = 1), scale = 'width') +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "top",
legend.justification = "right")
这样子,就是一个正常的小提琴图,按treat和untreat分成了2组。
下面,我们利用云雨图的技巧,利用geom_half_violin函数分别绘制两个group的一半,一半朝左,一半朝右。
我们先选出两个group的data,然后分别添加。
Treat <- data_new %>% filter(group=="Treated")
Untreat <- data_new %>% filter(group=="Untreated")
ggplot() +
#添加treat group的值,放在左边
geom_half_violin(data=Treat,aes(x = Genes, y = Values),
position = position_dodge(width = 1),
scale = 'width',
colour=NA,fill="#1ba7b3",
side = "l")+
#添加untreat group的值,放在左边
geom_half_violin(data=Untreat,aes(x = Genes, y = Values),
position = position_dodge(width = 1),
scale = 'width',
colour=NA,fill="#dfb424",
side = "r")+
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
#theme_bw()+
xlab("")+
ylab("log2(CPM)")
也可以像我们前面处理柱状图的技巧,通过stat_compare_means添加组间显著性标记。
ggplot() +
geom_half_violin(data=Treat,aes(x = Genes, y = Values),
position = position_dodge(width = 1),
scale = 'width',
colour=NA,fill="#1ba7b3",
side = "l")+
geom_half_violin(data=Untreat,aes(x = Genes, y = Values),
position = position_dodge(width = 1),
scale = 'width',
colour=NA,fill="#dfb424",
side = "r")+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "top",
legend.justification = "right")+
#theme_bw()+
xlab("")+
ylab("log2(CPM)")+
stat_compare_means(data = data_new,aes(x=Genes,y=Values,group=group),
symnum.args=list(cutpoints = c(0, 0.001, 0.01, 0.05, 1),
symbols = c("***", "**", "*", "-")),
label = "p.signif")
下面我就需要添加中位数和误差线了。主要通过stat_summary函数来实现。
ggplot() +
geom_half_violin(data=Treat,aes(x = Genes, y = Values),
position = position_dodge(width = 1),
scale = 'width',
colour=NA,fill="#1ba7b3",
side = "l")+
geom_half_violin(data=Untreat,aes(x = Genes, y = Values),
position = position_dodge(width = 1),
scale = 'width',
colour=NA,fill="#dfb424",
side = "r")+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "top",
legend.justification = "right")+
#theme_bw()+
xlab("")+
ylab("log2(CPM)")+
stat_compare_means(data = data_new,aes(x=Genes,y=Values,group=group),
symnum.args=list(cutpoints = c(0, 0.001, 0.01, 0.05, 1),
symbols = c("***", "**", "*", "-")),
label = "p.signif")+
stat_summary(data = data_new, aes(x = Genes,y = Values, group = group),
fun= "mean", geom = "point",shape = 23, size = 3, fill = "white",position = position_dodge(width = 0.2))
其中 fun= "mean", geom = "point"就是添加均值点。
fun= "median", geom = "point"可以添加中位数。
ggplot() +
geom_half_violin(data=Treat,aes(x = Genes, y = Values),
position = position_dodge(width = 1),
scale = 'width',
colour=NA,fill="#1ba7b3",
side = "l")+
geom_half_violin(data=Untreat,aes(x = Genes, y = Values),
position = position_dodge(width = 1),
scale = 'width',
colour=NA,fill="#dfb424",
side = "r")+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "top",
legend.justification = "right")+
#theme_bw()+
xlab("")+
ylab("log2(CPM)")+
stat_compare_means(data = data_new,aes(x=Genes,y=Values,group=group),
symnum.args=list(cutpoints = c(0, 0.001, 0.01, 0.05, 1),
symbols = c("***", "**", "*", "-")),
label = "p.signif")+
stat_summary(data = data_new, aes(x = Genes,y = Values, group = group),
fun = mean,
fun.min = function(x){quantile(x)[2]},
fun.max = function(x){quantile(x)[4]},
geom = "pointrange",
#geom = 'errorbar',
size=0.5,
position = position_dodge(width = 0.2))
我们利用pointrange来添加均值和误差线。
但是,我碰到一个问题,我不确定如何添加legend,感觉只能自己手动添加了。