散点图常用于展示两个变量之间的关系。下面将首先展示如何在R中绘制散点图;使用ggpubr包中的函数来添加相关系数和显著性水平;还将介绍如何进行分组着色以及如何在每个组周围添加椭圆。此外,还将展示如何绘制气泡图,以及如何添加边际图(直方图,密度图或箱线图)。
加载数据
library(ggpubr)
# Load data
data("mtcars")
df <- mtcars
# Convert cyl as a grouping variable
df$cyl <- as.factor(df$cyl)
# Inspect the data
head(df[, c("wt", "mpg", "cyl", "qsec")])
## wt mpg cyl qsec
## Mazda RX4 2.62 21.0 6 16.5
## Mazda RX4 Wag 2.88 21.0 6 17.0
## Datsun 710 2.32 22.8 4 18.6
## Hornet 4 Drive 3.21 21.4 6 19.4
## Hornet Sportabout 3.44 18.7 8 17.0
## Valiant 3.46 18.1 6 20.2
基本散点图
ggscatter(df, x = "wt", y = "mpg",
add = "reg.line", # Add regression line
conf.int = TRUE, # Add confidence interval
add.params = list(color = "blue",
fill = "lightgray")
)+
stat_cor(method = "pearson", label.x = 3, label.y = 30) # Add correlation coefficient
可以通过shape参数来修改点的形状:
ggscatter(df, x = "wt", y = "mpg",
shape = 18)
要查看其他的点形状,可以输入如下代码:
show_point_shapes()
点分组着色
ggscatter(df, x = "wt", y = "mpg",
add = "reg.line", # Add regression line
conf.int = TRUE, # Add confidence interval
color = "cyl", palette = "jco", # Color by groups "cyl"
shape = "cyl" # Change point shape by groups "cyl"
)+
stat_cor(aes(color = cyl), label.x = 3) # Add correlation coefficient
#延伸回归线-> fullrange = TRUE
#添加边际地毯(marginal density)---> rug = TRUE
ggscatter(df, x = "wt", y = "mpg",
add = "reg.line", # Add regression line
color = "cyl", palette = "jco", # Color by groups "cyl"
shape = "cyl", # Change point shape by groups "cyl"
fullrange = TRUE, # Extending the regression line
rug = TRUE # Add marginal rug
)+
stat_cor(aes(color = cyl), label.x = 3) # Add correlation coefficient
添加分组椭圆
主要参数:
- ellipse = TRUE: 在分组周围添加椭圆
- ellipse.level: 以正常概率表示椭圆的大小,默认值为0.95。
- ellipse.type: 椭圆类型,可选值可以是‘convex’, ‘confidence’ 或ggplot2::stat_ellipse支持的类型,包括
c(“t”, “norm”, “euclid”)
, 默认值为“norm”
ggscatter(df, x = "wt", y = "mpg",
color = "cyl", palette = "jco",
shape = "cyl",
ellipse = TRUE)
#将椭圆类型更改为'convex'
ggscatter(df, x = "wt", y = "mpg",
color = "cyl", palette = "jco",
shape = "cyl",
ellipse = TRUE, ellipse.type = "convex")
#添加组均值和星星图
ggscatter(df, x = "wt", y = "mpg",
color = "cyl", palette = "jco",
shape = "cyl",
ellipse = TRUE,
mean.point = TRUE,
star.plot = TRUE)
添加点标签
主要参数:
- label: 包含点标签的列名称。
- font.label: 一个列表,可以包含以下元素的组合: 点的大小(例如:14),样式(例如:“plain”, “bold”, “italic”, “bold.italic”),颜色(例如:“red”)。例如,
font.label = list(size = 14, face = “bold”, color =“red”)
。 - label.select: 字符向量,指定要显示的一些标签。
- repel = TRUE: 避免标签重叠。
#使用行名作为点标签
df$name <- rownames(df)
ggscatter(df, x = "wt", y = "mpg",
color = "cyl", palette = "jco",
label = "name", repel = TRUE)
# 指定要显示的标签
ggscatter(df, x = "wt", y = "mpg",
color = "cyl", palette = "jco",
label = "name", repel = TRUE,
label.select = c("Toyota Corolla", "Merc 280", "Duster 360"))
#根据一些标准显示标签
ggscatter(df, x = "wt", y = "mpg",
color = "cyl", palette = "jco",
label = "name", repel = TRUE,
label.select = list(criteria = "`x` > 4 & `y` < 15"))
气泡图
在气泡图中,点大小由连续变量(此处为“qsec”)控制,参数alpha用于控制颜色的透明度,取值在0到1之间。
ggscatter(df, x = "wt", y = "mpg",
color = "cyl", palette = "jco",
size = "qsec", alpha = 0.5)+
scale_size(range = c(0.5, 15)) # Adjust the range of points size
设置连续变量的颜色
下面将根据连续变量的值(此处为“mpg”)对点进行着色。默认情况下,将绘制蓝色渐变颜色,可以使用函数gradient_color()
修改。
# 连续变量的颜色
p <- ggscatter(df, x = "wt", y = "mpg",
color = "mpg")
p
# 修改渐变色
p + gradient_color(c("blue", "white", "red"))
添加边际图
ggExtra包中的函数ggMarginal()
可用于向散点图添加边际直方图,密度图或箱线图。
首先,安装ggExtra包:
install.packages("ggExtra")
绘制散点图:
# 添加边际密度图
library("ggExtra")
p <- ggscatter(iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", palette = "jco",
size = 3, alpha = 0.6)
ggMarginal(p, type = "density")
# 更改边际图类型
ggMarginal(p, type = "boxplot")
ggExtra包的局限性之一是它无法处理散点图和边际图中的多个分组,可以使用cowplot包来解决。
# Scatter plot colored by groups ("Species")
sp <- ggscatter(iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", palette = "jco",
size = 3, alpha = 0.6)+
border()
# Marginal density plot of x (top panel) and y (right panel)
xplot <- ggdensity(iris, "Sepal.Length", fill = "Species",
palette = "jco")
yplot <- ggdensity(iris, "Sepal.Width", fill = "Species",
palette = "jco")+
rotate()
# Cleaning the plots
sp <- sp + rremove("legend")
yplot <- yplot + clean_theme() + rremove("legend")
xplot <- xplot + clean_theme() + rremove("legend")
# Arranging the plot using cowplot
library(cowplot)
plot_grid(xplot, NULL, sp, yplot, ncol = 2, align = "hv",
rel_widths = c(2, 1), rel_heights = c(1, 2))
添加边际箱线图:
# Scatter plot colored by groups ("Species")
sp <- ggscatter(iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", palette = "jco",
size = 3, alpha = 0.6, ggtheme = theme_bw())
# Marginal boxplot of x (top panel) and y (right panel)
xplot <- ggboxplot(iris, x = "Species", y = "Sepal.Length",
color = "Species", fill = "Species", palette = "jco",
alpha = 0.5, ggtheme = theme_bw())+
rotate()
yplot <- ggboxplot(iris, x = "Species", y = "Sepal.Width",
color = "Species", fill = "Species", palette = "jco",
alpha = 0.5, ggtheme = theme_bw())
# Cleaning the plots
sp <- sp + rremove("legend")
yplot <- yplot + clean_theme() + rremove("legend")
xplot <- xplot + clean_theme() + rremove("legend")
# Arranging the plot using cowplot
library(cowplot)
plot_grid(xplot, NULL, sp, yplot, ncol = 2, align = "hv",
rel_widths = c(2, 1), rel_heights = c(1, 2))
但是,上面的图美中不足的是在主图和边际密度图之间存在多余的空隙,不够美观,有一种解决方案如下:
library(cowplot)
# 主图
pmain <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species))+
geom_point()+
ggpubr::color_palette("jco")
# 沿x轴的边际密度图
xdens <- axis_canvas(pmain, axis = "x")+
geom_density(data = iris, aes(x = Sepal.Length, fill = Species),
alpha = 0.7, size = 0.2)+
ggpubr::fill_palette("jco")
# 沿y轴的边际密度图
# 如果想使用coord_flip(),需要设置coord_flip = TRUE
ydens <- axis_canvas(pmain, axis = "y", coord_flip = TRUE)+
geom_density(data = iris, aes(x = Sepal.Width, fill = Species),
alpha = 0.7, size = 0.2)+
coord_flip()+
ggpubr::fill_palette("jco")
p1 <- insert_xaxis_grob(pmain, xdens, grid::unit(.2, "null"), position = "top")
p2<- insert_yaxis_grob(p1, ydens, grid::unit(.2, "null"), position = "right")
ggdraw(p2)