R绘图基础指南 | 1.条形图

BARPLOT.jpg

原文链接:https://mp.weixin.qq.com/s/sJ0aiLitY-ltPnXO3U8UMQ

1. 条形图

这个系列是关于R中基础图形和进阶图形的绘制。本来是给某公司录的课程,后来被鸽了,就在公众号里免费分享给大家。视频课程会陆续更新到我的B站【木舟笔记】,希望大家多多支持!

  • 1.1 绘制简单条形图

  • 1.2 绘制簇状条形图

  • 1.3 绘制频数条形图

  • 1.4 条形图着色

  • 1.5 对正负条形图分别着色

  • 1.6 调整条形宽度和条形间距

  • 1.7 绘制堆积条形图

  • 1.8 绘制百分比堆积条形图

  • 1.9 添加数据标签

  • 1.10 绘制 Cleveland 点图

1.1 绘制简单条形图

library(ggplot2)
library(gcookbook)  #数据集
ggplot(pg_mean, aes(x = group, y = weight)) + geom_bar(stat = "identity")
plot of chunk unnamed-chunk-1
# 没有 Time == 6
BOD
##   Time demand
## 1    1    8.3
## 2    2   10.3
## 3    3   19.0
## 4    4   16.0
## 5    5   15.6
## 6    7   19.8
# Time 是数值型(连续型)变量
str(BOD)
## 'data.frame':    6 obs. of  2 variables:
##  $ Time  : num  1 2 3 4 5 7
##  $ demand: num  8.3 10.3 19 16 15.6 19.8
##  - attr(*, "reference")= chr "A1.4, p. 270"
ggplot(BOD, aes(x = Time, y = demand)) + geom_bar(stat = "identity")
plot of chunk unnamed-chunk-1
# 使用 factor() 将 Time 转化为离散型(分类) 变量 
ggplot(BOD, aes(x = factor(Time), y = demand)) + geom_bar(stat = "identity")
plot of chunk unnamed-chunk-1
#fill 改变填充颜色 colour 改变边框线颜色
ggplot(pg_mean, aes(x = group, y = weight)) + geom_bar(stat = "identity", fill = "lightblue", 
    colour = "black")
plot of chunk unnamed-chunk-1

1.2 绘制簇状条形图

library(gcookbook)  # For the data setcabbage_exp
##   Cultivar Date Weight     sd  n      se## 1      c39  d16   3.18 0.9566 10 0.30251## 2      c39  d20   2.80 0.2789 10 0.08819## 3      c39  d21   2.74 0.9834 10 0.31098## 4      c52  d16   2.26 0.4452 10 0.14079## 5      c52  d20   3.11 0.7909 10 0.25009## 6      c52  d21   1.47 0.2111 10 0.06675
# 通过将分类变量映射给fill参数来绘制簇状条形图。使用参数position = "dodge",使得两组条形在水平方向错开排列,否则会输出堆积条形图。ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) + geom_bar(position = "dodge",stat = "identity")
plot of chunk unnamed-chunk-2
# colour = "black" 为条形图添加黑色边框线# scale_fill_brewer()或者scale_fill_manual()函数对图形颜色进行设置# 这里使用RColorBrewer包的Pastel1调色盘进行调色ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) + geom_bar(stat = "identity",position = "dodge",     colour = "black") + scale_fill_brewer(palette = "Pastel1")
plot of chunk unnamed-chunk-2
## 如果分类变量有缺失,那么绘图结果中的相应地略去不绘,同时临近的条形会自动的扩充到相应位置。
ce <- cabbage_exp[1:5, ]  # 复制删除了最后一行的数据集
> ce  Cultivar Date Weight        sd  n1      c39  d16   3.18 0.9566144 102      c39  d20   2.80 0.2788867 103      c39  d21   2.74 0.9834181 104      c52  d16   2.26 0.4452215 105      c52  d20   3.11 0.7908505 10          se1 0.302508032 0.088191713 0.310984104 0.140791415 0.25008887
ggplot(ce, aes(x = Date, y = Weight, fill = Cultivar)) + geom_bar(stat = "identity",position = "dodge", colour = "black") + scale_fill_brewer(palette = "Pastel1")
## 可以在分类变量组合缺失的那一项为变量y手动输入一个NA值,来调整。
plot of chunk unnamed-chunk-2

1.3 绘制频数条形图

## diamonds 数据集共有53490行数据, 每行数据对应一颗钻石的品质信息> diamonds# A tibble: 53,940 x 10   carat cut       color clarity depth table   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> 1  0.23 Ideal     E     SI2      61.5    55 2  0.21 Premium   E     SI1      59.8    61 3  0.23 Good      E     VS1      56.9    65 4  0.29 Premium   I     VS2      62.4    58 5  0.31 Good      J     SI2      63.3    58 6  0.24 Very Good J     VVS2     62.8    57 7  0.24 Very Good I     VVS1     62.3    57 8  0.26 Very Good H     SI1      61.9    55 9  0.22 Fair      E     VS2      65.1    6110  0.23 Very Good H     VS1      59.4    61# ... with 53,930 more rows, and 4 more#   variables: price <int>, x <dbl>,#   y <dbl>, z <dbl>
## geom_bar() 函数在默认情况下将参数设定为 stat="bin",该操作会自动计算每组变量对应的观测数。
ggplot(diamonds, aes(x = cut)) + geom_bar()
plot of chunk unnamed-chunk-3
##  如果x轴对应的是连续型变量,我们会得到一张直方图。
ggplot(diamonds, aes(x = carat)) + geom_bar()
plot of chunk unnamed-chunk-3

1.4 条形图着色

library(gcookbook)  
##  以uspopchange为例。该数据集描述了美国各州人口自2000至2010年的变化情况。选取人口增长最快的10个州进行绘图。根据地区信息进行着色。upc <- subset(uspopchange, rank(Change)>40)
> upc            State Abb Region Change3         Arizona  AZ   West   24.66        Colorado  CO   West   16.910        Florida  FL  South   17.611        Georgia  GA  South   18.313          Idaho  ID   West   21.129         Nevada  NV   West   35.134 North Carolina  NC  South   18.541 South Carolina  SC  South   15.344          Texas  TX  South   20.645           Utah  UT   West   23.8
ggplot(upc, aes(x = Abb, y = Change, fill = Region)) + geom_bar(stat = "identity")
plot of chunk unnamed-chunk-4
## 使用scale_fill_manual()对颜色进行重新设定
ggplot(upc, aes(x = reorder(Abb, Change), y = Change, fill = Region)) + geom_bar(stat = "identity",     colour = "black") + scale_fill_manual(values = c("#669933", "#FFCC66")) +     xlab("State")
plot of chunk unnamed-chunk-4

1.5 对正负条形图分别着色

library(gcookbook)  #以climate数据的一个子集为例csub <- subset(climate, Source == "Berkeley" & Year >= 1900)csub$pos <- csub$Anomaly10y >= 0head(csub)
> head(csub)      Source Year Anomaly1y Anomaly5y101 Berkeley 1900        NA        NA102 Berkeley 1901        NA        NA103 Berkeley 1902        NA        NA104 Berkeley 1903        NA        NA105 Berkeley 1904        NA        NA106 Berkeley 1905        NA        NA    Anomaly10y Unc10y   pos101     -0.171  0.108 FALSE102     -0.162  0.109 FALSE103     -0.177  0.108 FALSE104     -0.199  0.104 FALSE105     -0.223  0.105 FALSE106     -0.241  0.107 FALSE
ggplot(csub, aes(x = Year, y = Anomaly10y, fill = pos)) + geom_bar(stat = "identity",     position = "identity")
plot of chunk unnamed-chunk-5
## 使用scale_fill_manual()对颜色进行调整,设定参数 guide = FALSE可以删除图例,通过设定边框线colour和size(宽度),来调整边框。单位是毫米。
ggplot(csub, aes(x = Year, y = Anomaly10y, fill = pos)) + geom_bar(stat = "identity",     position = "identity", colour = "black", size = 0.25) + scale_fill_manual(values = c("#CCEEFF",     "#FFDDDD"), guide = FALSE)
plot of chunk unnamed-chunk-5

1.6 调整条形宽度和条形间距

library(gcookbook)  # geom_bar()函数的参数 width 可以使条形变得更宽或者更窄,该参数的默认值为0.9ggplot(pg_mean, aes(x = group, y = weight)) + geom_bar(stat = "identity")
plot of chunk unnamed-chunk-6
ggplot(pg_mean, aes(x = group, y = weight)) + geom_bar(stat = "identity", width = 0.5)
plot of chunk unnamed-chunk-6
ggplot(pg_mean, aes(x = group, y = weight)) + geom_bar(stat = "identity", width = 1)
plot of chunk unnamed-chunk-6
ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) + geom_bar(stat = "identity",width = 0.5, position = "dodge")
plot of chunk unnamed-chunk-6
# 增加组内条形的间距ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) + geom_bar(stat = "identity",     width = 0.5, position = position_dodge(0.7))
plot of chunk unnamed-chunk-6

以下4个命令是等价的:

geom_bar(position = "dodge")
geom_bar(width = 0.9, position = position_dodge())
geom_bar(position = position_dodge(0.9))
geom_bar(width = 0.9, position = position_dodge(width = 0.9))

1.7 绘制堆积条形图

library(gcookbook) ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) + geom_bar(stat = "identity")
plot of chunk unnamed-chunk-7
## 默认的堆积条形图的条形堆积顺序于图例顺序相反,可以通过guides()函数进行调整,并指定图例所需要调整的图形属性。
ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) + geom_bar(stat = "identity") + guides(fill = guide_legend(reverse = TRUE))
plot of chunk unnamed-chunk-7
# 可以通过指定映射中的参数 order = desc()来调整堆叠顺序library(plyr)  # 为了使用desc()函数ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar, order = desc(Cultivar))) +     geom_bar(stat = "identity")
plot of chunk unnamed-chunk-7
# 使用cale_fill_brewer()调整颜色,设定colour = "black" 为条形添加黑色边框线ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) + geom_bar(stat = "identity",     colour = "black") + guides(fill = guide_legend(reverse = TRUE)) + scale_fill_brewer(palette = "Pastel1")

[图片上传失败...(image-a01df6-1626228927789)]

1.8 绘制百分比堆积条形图

## 首先,通过plyr包中的ddply()函数和transform()函数将每组条形对应的数据标准化为100%格式。之后,针对计算结果绘制堆积条形图即可。
library(gcookbook)  library(plyr)ce <- ddply(cabbage_exp, "Date", transform, percent_weight = Weight/sum(Weight) *100)ggplot(ce, aes(x = Date, y = percent_weight, fill = Cultivar)) + geom_bar(stat = "identity")

[图片上传失败...(image-60ffac-1626228927789)]

ggplot(ce, aes(x = Date, y = percent_weight, fill = Cultivar)) + geom_bar(stat = "identity",     colour = "black") + guides(fill = guide_legend(reverse = TRUE)) + scale_fill_brewer(palette = "Pastel1")

[图片上传失败...(image-3c6343-1626228927789)]

1.9 添加数据标签

## geom_text()可为条形图添加数据标签## 设定 vjust()可以将标签位置移动至条形图顶端的上方或者下方
library(gcookbook)  # 在条形图顶端下方ggplot(cabbage_exp, aes(x = interaction(Date, Cultivar), y = Weight)) + geom_bar(stat = "identity") + geom_text(aes(label = Weight), vjust = 1.5, colour = "white")

[图片上传失败...(image-4c82e2-1626228927789)]

# 在条形图顶端上方ggplot(cabbage_exp, aes(x = interaction(Date, Cultivar), y = Weight)) + geom_bar(stat = "identity") +     geom_text(aes(label = Weight), vjust = -0.2)

[图片上传失败...(image-b5abb1-1626228927789)]

# 将y轴上限变大ggplot(cabbage_exp, aes(x = interaction(Date, Cultivar), y = Weight)) +   geom_bar(stat = "identity") +   geom_text(aes(label = Weight), vjust = -0.2) +   ylim(0, max(cabbage_exp$Weight) * 1.05)

[图片上传失败...(image-dcc916-1626228927789)]

# 设定标签的y轴位置ggplot(cabbage_exp, aes(x = interaction(Date, Cultivar), y = Weight)) + geom_bar(stat = "identity") +     geom_text(aes(y = Weight + 0.1, label = Weight))

[图片上传失败...(image-ffd831-1626228927789)]

ggplot(cabbage_exp, aes(x = Date, y = Weight, fill = Cultivar)) + geom_bar(stat = "identity", position = "dodge") + geom_text(aes(label = Weight), vjust = 1.5, colour = "white",     position = position_dodge(0.9), size = 3)

[图片上传失败...(image-fb47df-1626228927789)]

## 向堆积条形图添加数据标签## 对每组条形对应的数据进行求和 可以使用plyr包的arrange()函数完成上述操作
library(plyr)# 根据日期和性别对数据进行排序ce <- arrange(cabbage_exp, Date, Cultivar)# 计算累计和ce <- ddply(ce, "Date", transform, label_y = cumsum(Weight))ce
##   Cultivar Date Weight     sd  n      se label_y## 1      c39  d16   3.18 0.9566 10 0.30251    3.18## 2      c52  d16   2.26 0.4452 10 0.14079    5.44## 3      c39  d20   2.80 0.2789 10 0.08819    2.80## 4      c52  d20   3.11 0.7909 10 0.25009    5.91## 5      c39  d21   2.74 0.9834 10 0.31098    2.74## 6      c52  d21   1.47 0.2111 10 0.06675    4.21
ggplot(ce, aes(x = Date, y = Weight, fill = Cultivar)) +   geom_bar(stat = "identity") +       geom_text(aes(y = label_y, label = Weight), vjust = 1.5, colour = "white")

[图片上传失败...(image-939fe7-1626228927789)]

ce <- arrange(cabbage_exp, Date, Cultivar)# 计算y轴位置, 将数据标签置于条形中部ce <- ddply(ce, "Date", transform, label_y = cumsum(Weight) - 0.5 * Weight)ggplot(ce, aes(x = Date, y = Weight, fill = Cultivar)) + geom_bar(stat = "identity") +     geom_text(aes(y = label_y, label = Weight), colour = "white")

[图片上传失败...(image-e951a7-1626228927789)]

#修饰#paste()函数在标签后面添加"kg", format()函数 令标签保留两位小数ggplot(ce, aes(x = Date, y = Weight, fill = Cultivar)) +   geom_bar(stat = "identity", colour = "black") +   geom_text(aes(y = label_y, label = paste(format(Weight,nsmall = 2), "kg")), size = 4) +   guides(fill = guide_legend(reverse = TRUE)) +   scale_fill_brewer(palette = "Pastel1")

[图片上传失败...(image-b621b5-1626228927789)]

1.10 绘制 Cleveland 点图

library(gcookbook) tophit <- tophitters2001[1:25, ] # 取出 tophitters 数据集中的前25个数据ggplot(tophit, aes(x=avg, y=name)) + geom_point()

[图片上传失败...(image-f479eb-1626228927789)]

tophit[, c("name", "lg", "avg")]
##                 name lg    avg## 1       Larry Walker NL 0.3501## 2      Ichiro Suzuki AL 0.3497## 3       Jason Giambi AL 0.3423## 4     Roberto Alomar AL 0.3357## 5        Todd Helton NL 0.3356## 6        Moises Alou NL 0.3314## 7      Lance Berkman NL 0.3310## 8         Bret Boone AL 0.3307## 9  Frank Catalanotto AL 0.3305## 10     Chipper Jones NL 0.3304## 11     Albert Pujols NL 0.3288## 12       Barry Bonds NL 0.3277## 13        Sammy Sosa NL 0.3276## 14       Juan Pierre NL 0.3274## 15     Juan Gonzalez AL 0.3252## 16     Luis Gonzalez NL 0.3251## 17      Rich Aurilia NL 0.3239## 18      Paul Lo Duca NL 0.3196## 19        Jose Vidro NL 0.3189## 20    Alex Rodriguez AL 0.3180## 21       Cliff Floyd NL 0.3171## 22   Shannon Stewart AL 0.3156## 23      Jeff Cirillo NL 0.3125## 24       Jeff Conine AL 0.3111## 25       Derek Jeter AL 0.3111
# reorder(name, avg)先将 name 转化为因子,然后根据 avg 对其进行排序ggplot(tophit, aes(x=avg, y=reorder(name, avg))) +    geom_point(size=3) +     theme_bw() +    theme(panel.grid.major.x = element_blank(),          panel.grid.minor.x = element_blank(),          panel.grid.major.y = element_line(colour="grey60", linetype="dashed"))

[图片上传失败...(image-9d8d6b-1626228927789)]

# x,y轴互换ggplot(tophit, aes(x=reorder(name, avg), y=avg)) +    geom_point(size=3) +                           theme_bw() +    theme(axis.text.x = element_text(angle=60, hjust=1),          panel.grid.major.y = element_blank(),          panel.grid.minor.y = element_blank(),          panel.grid.major.x = element_line(colour="grey60", linetype="dashed"))

[图片上传失败...(image-4451d6-1626228927789)]

# 对分组变量进行展示# 提取出 name 变量, 根据 lg 和 avg对其排序nameorder <- tophit$name[order(tophit$lg, tophit$avg)]# 将 name转化为因子,因子水平与 nameorder 一致tophit$name <- factor(tophit$name, levels=nameorder)# geom_segment()函数用"以数据点为端点的线段"代替网格线ggplot(tophit, aes(x=avg, y=name)) +    geom_segment(aes(yend=name), xend=0, colour="grey50") +    geom_point(size=3, aes(colour=lg)) +    scale_colour_brewer(palette="Set1", limits=c("NL","AL")) +    theme_bw() +    theme(panel.grid.major.y = element_blank(),           legend.position=c(1, 0.55),  # 将图例放在绘图区域          legend.justification=c(1, 0.5))

[图片上传失败...(image-45621b-1626228927789)]

#分面展示分组数据ggplot(tophit, aes(x=avg, y=name)) +    geom_segment(aes(yend=name), xend=0, colour="grey50") +    geom_point(size=3, aes(colour=lg)) +    scale_colour_brewer(palette="Set1", limits=c("NL","AL"), guide=FALSE) +    theme_bw() +    theme(panel.grid.major.y = element_blank()) +    facet_grid(lg ~ ., scales="free_y", space="free_y")

[图片上传失败...(image-30511c-1626228927789)]

## 要修改分面显示的堆叠顺序 只有通过调整 lg 变量的因子水平来实现

参考书籍

  • R Graphics Cookbook, 2nd edition.
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 215,634评论 6 497
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,951评论 3 391
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 161,427评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,770评论 1 290
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,835评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,799评论 1 294
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,768评论 3 416
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,544评论 0 271
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,979评论 1 308
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,271评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,427评论 1 345
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,121评论 5 340
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,756评论 3 324
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,375评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,579评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,410评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,315评论 2 352

推荐阅读更多精彩内容