文章信息如图,今天学习复现的图表文章为最近发表的论文~
复刻图来源的文章
文章概要:在蓝藻聚集体内,蓝藻与共生细菌之间的物种间氢转移被发现是氮损失的新途径。蓝藻能够在昼夜周期下生成H2,作为邻近氢营养脱氮菌的电子供体。工程化蓝藻聚集体的氢营养脱氮速率为3.47 ± 0.42 mmol l–1 day–1,几乎占到异养脱氮速率的50%。在63个全球分布的蓝藻聚集体中,84%存在H2生成蓝藻和氢营养脱氮菌。研究表明,蓝藻繁盛期间的物种间氢转移可能导致氮损失率过高(对文章感兴趣自己看哦,Interspecies hydrogen transfer between cyanobacteria and symbiotic bacteria drives nitrogen loss)。
学习复现的图为 Fig. 6e
论文原图
R 代码复刻图(其他美化细节需要使用画图软件 AI 喽)
这张图展示了一个热图(heatmap),用于比较不同生态系统(如工程系统、湖泊、海洋、温泉和冰川)中微生物代谢过程的基因丰度。热图中的每个方块代表一个特定基因在特定生态系统中的相对丰度,颜色的深浅表示丰度的高低,颜色越深表示丰度越高。
此图其他适用场景:不同分组病人(样本)在感兴趣的几个通路中的一些重要基因的表达比较(把分组信息定义成自己感兴趣想比较的组就可以)。
生成模拟数据
#### 生成模拟数据 尽量 复现 原图 ####
library(writexl)
set.seed(123)
# 样本分组与命名
samples <- c(
paste0("W", 1:18), # Engineered system
paste0("L", 1:19), # Lake
paste0("M", 1:10), # Marine
paste0("H", 1:12), # Hot spring
paste0("G", 1:10) # Glacier
)
group2 <- c(
rep("Engineered system", 18),
rep("Lake", 19),
rep("Marine", 10),
rep("Hot spring", 12),
rep("Glacier", 10)
)
# 基因名(可自定义或用下方示例)
genes <- c(
"Aerobic respiration-CcoN", "Aerobic respiration-CoxA", "Aerobic respiration-CydA", "Aerobic respiration-CyoA",
"Nitrate reduction-NapA", "Nitrate reduction-NarG", "DNRA-NrfH", "Nitrite reduction-NirK",
"Nitric oxide reduction-NorB", "Dinitrogen fixation-NifH", "H2-uptake hydrogenase", "H2-evolving hydrogenase",
"H2-bidirectional hydrogenase", "Arsenate reduction-ArsC", "Sulfate reduction-Sat", "Sulfate reduction-AprA",
"Ferric iron reduction-MtrA", "Methane production-McrA", "Ferrous iron oxidation-Cyc2", "Thiosulfate oxidation-SoxB",
"Sulfide oxidation-Sqr", "Sulfide oxidation-DsrA", "Formate oxidation-FdhA", "PSI reaction centre-PsaA",
"PSII reaction centre-PsbA", "Light proton pumping-RHO"
)
n_gene <- length(genes)
n_sample <- length(samples)
# group2(功能分组)可随机分配或自定义
gene_group2 <- rep(c("Oxygen metabolism", "Nitrogen metabolism", "Hydrogen metabolism", "Other electron acceptors", "Other electron donors", "Phototrophy"), length.out = n_gene)
# 构建数据框
df <- data.frame(
Gene = genes,
group2 = gene_group2
)
# 添加样本数据(0~1之间的随机数)
for (s in samples) {
df[[s]] <- runif(n_gene)
}
# 写入Excel
write_xlsx(list("Fig.6e" = df), "multiGroup_heatmap_data.xlsx")
# 模拟 group.tsv
group_df <- data.frame(
Sample = samples,
group = group2
)
readr::write_tsv(group_df, "multiGroup_heatmap_group.tsv")
(自己有数据的话根据生成的模拟数据格式,把自己的数据替代模拟数据就可以了)
读入模拟数据
library(tidyverse)
library(readxl)
library(ggplot2)
library(ggh4x)
library(ggtext)
df <- read_excel("multiGroup_heatmap_data.xlsx",sheet ="Fig.6e") %>%
select(1:59) %>%
pivot_longer(-c(Gene,group2)) %>%
mutate(name=str_replace_all(name,"WW","W")) %>%
left_join(.,read_tsv("multiGroup_heatmap_group.tsv") %>%
mutate(Sample=str_replace_all(Sample,"WW","W")),
by=c("name"="Sample"))
# 定义因子
df$name <- factor(df$name,levels = unique(df$name))
df$Gene <- factor(df$Gene,levels = rev(unique(df$Gene)))
df$group <- factor(df$group,levels = unique(df$group))
df$group2 <- factor(df$group2,levels = unique(df$group2))
# 分面条带颜色
strips <- strip_themed(
background_y =elem_list_rect(fill=c("#EC7A05","#9AC2CD99","#82B7F590","#5A77D190",
"#FAD77B","#16419480")),
background_x=elem_list_rect(fill=c("#5A77D1","#44007A","#9AC2CD","#6B9B3C",
"#82B7F5")))
ggplot画图
ggplot(df,aes(name,Gene,fill=value))+
geom_tile(color="grey80")+
facet_grid2(group2~group,scale="free",
switch="y",strip=strips) +
force_panelsizes(cols=c(13,13,9,12,10),
rows = c(4,6,3,5,5,3),respect=F)+
scale_x_discrete(expand = c(0,0),
position = "top")+
scale_y_discrete(expand = c(0,0))+
scale_fill_gradientn(
colors = c("white", "#e0f3f3", "#66b2a3", "#004c4c", "black"),
values = scales::rescale(c(0, 0.01, 0.6, 0.95, 1.01)),
limits = c(0, 1),
oob = scales::squish,
breaks = c(0, 0.3, 0.6, 0.9, 1),
labels = c("0.0", "0.3", "0.6", "0.9", ">1.0")
)+
labs(x=NULL,y=NULL) +
coord_cartesian(clip="off") +
theme_test() +
theme(axis.text.x=
element_text(color="black",angle = 45,
vjust=0.5,hjust =0,size=7),
axis.text.y=element_text(color="black",size=7),
strip.placement = "outside",
axis.ticks = element_blank(),
panel.spacing.x = unit(0,"cm"),
panel.spacing.y = unit(0,"cm"),
strip.text.x =
element_text(color="white",size=10,face="bold"),
strip.text.y = element_markdown(color="black",size=7),
strip.background = element_rect(color="white"),
legend.background = element_blank(),
legend.title = element_blank(),
legend.key.height = unit(1,"null"))
以上代码运行完以后就是得到复刻的图了:
复刻图
其他注意事项
Error in validate_element_list(background_x, "element_rect") :
object 'is_theme_element' not found
如果报错,可能是 ggh4x包版本较老,或者和ggplot2版本不兼容。
解决方法
升级 ggh4x 和 ggplot2
请在R中运行以下命令,升级到最新版:
install.packages("ggplot2")
install.packages("ggh4x")
当前运行成功的参考代码环境:
> sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS 15.4.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Asia/Shanghai
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils
[5] datasets methods base
(比较重要的是这一块环境)
other attached packages:
[1] writexl_1.5.4 ggtext_0.1.2
[3] ggh4x_0.3.1 readxl_1.4.3
[5] lubridate_1.9.3 forcats_1.0.0
[7] stringr_1.5.1 dplyr_1.1.4
[9] purrr_1.0.2 readr_2.1.5
[11] tidyr_1.3.1 tibble_3.2.1
[13] ggplot2_3.5.2 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] utf8_1.2.4 generics_0.1.3
[3] xml2_1.3.6 stringi_1.8.4
[5] hms_1.1.3 magrittr_2.0.3
[7] grid_4.3.3 timechange_0.3.0
[9] cellranger_1.1.0 fansi_1.0.6
[11] scales_1.3.0 cli_3.6.3
[13] rlang_1.1.4 crayon_1.5.3
[15] bit64_4.5.2 munsell_0.5.1
[17] commonmark_1.9.2 withr_3.0.2
[19] tools_4.3.3 tzdb_0.4.0
[21] colorspace_2.1-1 vctrs_0.6.5
[23] R6_2.5.1 lifecycle_1.0.4
[25] bit_4.5.0 vroom_1.6.5
[27] pkgconfig_2.0.3 pillar_1.9.0
[29] gtable_0.3.6 glue_1.8.0
[31] Rcpp_1.0.13-1 xfun_0.49
[33] tidyselect_1.2.1 rstudioapi_0.17.1
[35] farver_2.1.2 compiler_4.3.3
[37] markdown_1.13 gridtext_0.1.5