写在前面
工作需要总是想写一个图片更好的展示“基因名→基因功能类型→基因参与的工作→基因潜在的调控因子(可选)”的信息流。总的来说,一个冲击图可以比表格更好的展示这些基因的数量以及基因功能的分类,帮助读者分类总结,在不同条件下,基因上下调的模式和特异性。对于这类冲击图的制作,网上已经有不少成熟的在线工具,例如 RAWGraphs。人生总是充满了折腾的,如果想更详细的调整自己的冲击图,那么一些离线画图工具是必不可少的。幸运的是,R语言从来不缺少类似的轮子,于是就有了这篇文章。
ggalluvial
首先介绍一下ggalluvial,既然你要画冲击图(alluvial diagram),肯定先看看R语言下叫这个名字的包用起来怎么样吧。教程很简单,这里先介绍一下数据。数据来源于泰坦尼克号沉船事故中人员性别(sex),年龄(age),船舱等级(class)和生还率的统计。之后的所有操作全部使用同一组数据,方便比较。
最终调试到希望的效果
library("ggalluvial")
#读入数据
titanic_wide <- data.frame(Titanic)
#看一眼数据结构
titanic_wide
#作图,data=读入数据,总共分了四个轴,长度y是频次
ggplot(data = titanic_wide,
aes(axis1 = Class, axis2 = Sex, axis3 = Age, axis4 = Survived,
y = Freq)) +
geom_alluvium(aes(fill = Survived), width = 1/30)+ #调节线粗细,颜色分组使用存活率
geom_stratum(width = 1/30, alpha= .7,fill = "black", color = "white") + #这里宽度要和上一步一致可以用size设置线宽,但设置了不是太美观
geom_text(stat = "stratum", aes(label = after_stat(stratum)), nudge_x = .10) + theme_void() #设置字位置,我又加了点距离(使用nudge_x)
最终成品如下:
怎么说呢,axis中四个数离得实在有点太近了。如果能稍微分开点就满足了。
ggforce
ggforce这个包就厉害了,能画的东西很多。这里只是使用一个很小的功能parallel set。因为不是专门用来画冲击图的工具,所以在数据上需要做一点变化。
library("ggforce")
#数据格式转化
data <- reshape2::melt(Titanic)
data <- gather_set_data(data, 1:4)
#作图,感兴趣的可以看看数据结构。图是按照节点来画的。x是大分类(class,sex,age等),y是小类(1st,2nd,3rd等)。value是频次,id是节点。
ggplot(data, aes(x, id = id, split = y, value = value)) +
geom_parallel_sets(aes(fill = Class), alpha = 0.3, axis.width = 0.05) + #颜色,透明度,轴宽度
geom_parallel_sets_axes(axis.width = 0.05) +
scale_x_discrete(limits = c("Class", "Sex", "Age", "Survived"))+ #x轴排序,你希望哪个在前面就把哪个放前头
geom_parallel_sets_labels(colour = 'black', angle = 0, position = position_nudge(x = .2, y = 0), hjust = 0)+ theme_void() # y轴轻微右偏,去除框线
基本满足自己的折腾需求了,就是这里的label如果能左对齐就更好了,如果谁有好办法希望能在评论里交流。
更新翻代码的时候找到layer方面的设置了。这个参数调用了ggplot2的layer参数,然后更加基础的参数在ggrepel里。
具体就是hjust控制,=0表示左对齐,=1表示右对齐,0.5表示居中。
参考链接https://cran.r-project.org/web/packages/ggrepel/vignettes/ggrepel.html
easyalluvial
这个折腾的比较少,优点是可以出可以互动的冲击图,没有自己研究,喜欢的可以试试。
library("easyalluvial")
library("parcats")
p2 <- alluvial_wide(data = titanic_wide,
max_variables = 5,
fill_by = 'all_flows')
p2
parcats(p2, marginal_histograms = TRUE, data_input = titanic_wide)
networkD3
又是一个大型包,画图肯定是可以的,但不是基于ggplot2的。不是很熟悉了,就单纯的放个demo吧……
s | t | v |
---|---|---|
1st | Male | 0 |
2nd | Male | 0 |
3rd | Male | 35 |
Crew | Male | 0 |
1st | Female | 0 |
2nd | Female | 0 |
3rd | Female | 17 |
Crew | Female | 0 |
1st | Male | 118 |
2nd | Male | 154 |
3rd | Male | 387 |
Crew | Male | 670 |
1st | Female | 4 |
2nd | Female | 13 |
3rd | Female | 89 |
Crew | Female | 3 |
1st | Male | 5 |
2nd | Male | 11 |
3rd | Male | 13 |
Crew | Male | 0 |
1st | Female | 1 |
2nd | Female | 13 |
3rd | Female | 14 |
Crew | Female | 0 |
1st | Male | 57 |
2nd | Male | 14 |
3rd | Male | 75 |
Crew | Male | 192 |
1st | Female | 140 |
2nd | Female | 80 |
3rd | Female | 76 |
Crew | Female | 20 |
Male | Child | 0 |
Male | Child | 0 |
Male | Child | 35 |
Male | Child | 0 |
Female | Child | 0 |
Female | Child | 0 |
Female | Child | 17 |
Female | Child | 0 |
Male | Adult | 118 |
Male | Adult | 154 |
Male | Adult | 387 |
Male | Adult | 670 |
Female | Adult | 4 |
Female | Adult | 13 |
Female | Adult | 89 |
Female | Adult | 3 |
Male | Child | 5 |
Male | Child | 11 |
Male | Child | 13 |
Male | Child | 0 |
Female | Child | 1 |
Female | Child | 13 |
Female | Child | 14 |
Female | Child | 0 |
Male | Adult | 57 |
Male | Adult | 14 |
Male | Adult | 75 |
Male | Adult | 192 |
Female | Adult | 140 |
Female | Adult | 80 |
Female | Adult | 76 |
Female | Adult | 20 |
Child | No | 0 |
Child | No | 0 |
Child | No | 35 |
Child | No | 0 |
Child | No | 0 |
Child | No | 0 |
Child | No | 17 |
Child | No | 0 |
Adult | No | 118 |
Adult | No | 154 |
Adult | No | 387 |
Adult | No | 670 |
Adult | No | 4 |
Adult | No | 13 |
Adult | No | 89 |
Adult | No | 3 |
Child | Yes | 5 |
Child | Yes | 11 |
Child | Yes | 13 |
Child | Yes | 0 |
Child | Yes | 1 |
Child | Yes | 13 |
Child | Yes | 14 |
Child | Yes | 0 |
Adult | Yes | 57 |
Adult | Yes | 14 |
Adult | Yes | 75 |
Adult | Yes | 192 |
Adult | Yes | 140 |
Adult | Yes | 80 |
Adult | Yes | 76 |
Adult | Yes | 20 |
复制上面的表格,作图
library("networkD3")
a <- read.table("clipboard",header = TRUE, sep = '\t')
a$s <- as.character(a$s)
a$t <- as.character(a$t)
Sankeylinks<-a #取边的数据
Sankeynodes<-data.frame(name=unique(c(Sankeylinks$s,Sankeylinks$t))) #取点的数据,用unique去重,转化为数据框格式,并将列名设置为“name”
Sankeynodes$index<-0:(nrow(Sankeynodes) - 1) #增加设置1列index,方便后面合并,取值为0到总行数-1
Sankeylinks<-merge(Sankeylinks,Sankeynodes,by.x="s",by.y="name") #将边数据与点数据合并,来源点即s为第4列
Sankeylinks<-merge(Sankeylinks,Sankeynodes,by.x="t",by.y="name") #将边数据与点数据合并,目标点即t为第5列
Sankeydata<-Sankeylinks[,c(4,5,3)]; #取第4、5、3列数据,及来源、目标、边的值或权重
names(Sankeydata)<-c("Source","Target","Value") #将三列数据分别命名
Sankeyname<-Sankeynodes[,1,drop=FALSE] #取点的名称,即第一列
sankeyNetwork(Links=Sankeydata,Nodes=Sankeyname, Source ="Source",
Target = "Target", Value = "Value", NodeID = "name",
fontSize = 8, nodeWidth = 20)
怎么说呢 又是一个可以互动的图,我需要的还是静态图,喜欢折腾的可以玩玩。
最后祝磕盐顺利……