[R语言] ggplot2包 可视化《R for data science》 1

《R for Data Science》第二、三章 Data visualisation 啃书知识点积累

参考书籍

  1. 《R for data science》
  2. 《R数据科学》
  3. The Layered Grammar of Graphics.
  4. ggplot2: Points

“The simple graph has brought more information to the data analyst’s mind than any other device.” — John Tukey
“The greatest value of a picture is when it forces us to notice what we never expected to see.” — John Tukey

A graphing template

ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(
     mapping = aes(<MAPPINGS>),
     stat = <STAT>, 
     position = <POSITION>
  ) +
  <COORDINATE_FUNCTION> +
  <FACET_FUNCTION>

Aesthetic mappings

# Left
p1 <- ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, alpha = class))

# Right
p2 <- ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, shape = class))  

p1 + p2
# Warning messages:
# 1: Using alpha for a discrete variable is not advised. 
# 2: The shape palette can deal with a maximum of 6 discrete values
# because more than 6 becomes difficult to discriminate; you have
# 7. Consider specifying shapes manually if you must have them. 
# 3: Removed 62 rows containing missing values (geom_point). 

ggplot2 will only use six shapes at a time. By default, additional groups will go unplotted when you use the shape aesthetic.

- How do these aesthetics behave differently for categorical vs. continuous variables

'''
color 有序属性
1. 分类变量映射:对应多种不同颜色
2. 连续变量映射:形成有固定范围的色阶,在色阶内部取色

size 有序属性
1. 分类变量映射:点大小和分类类型逐一对应但不相关,且会警告
2. 连续变量映射:点的大小和连续变量线性相关

shape 无序属性
1. 分类变量映射:对应多种形状,最多同时出现6种,超过则不显示且有警告
2. 连续变量映射:无法映射
'''

- mpg的变量类型

  • stroke属性
p1 <- ggplot(mpg,aes(x = displ, y = hwy)) +
  geom_point(shape = 1)

p2 <- ggplot(mpg,aes(x = displ, y = hwy)) +
  geom_point(shape = 1,stroke = 2)

p1 + p2

Facet 分面

- 封装型 wrap

ggplot(mpg) + 
  geom_point(aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)

facet_wrap()参数如下:


# strip.position参数调节标签的朝向
p1 <- ggplot(mpg) + 
  geom_point(aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2, strip.position = 'bottom')

p2 <- ggplot(mpg) + 
  geom_point(aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2, strip.position = 'right')

p1 + p2

- 在分面中呈现总数据

ggplot(mpg, aes(displ, hwy)) +
  geom_point(data = transform(mpg, class = NULL), 
             colour = "grey85") +
  geom_point() +
  facet_wrap(~ class)

- 网格型 grid

# . 的作用表示的是不想在行或者列的维度上进行分面
p1 <- ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(drv ~ .) # 列 ~ 行

p2 <- ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(. ~ cyl)

p1 + p2

Geometric objects

- 不显示图例和置信区间

p1 <- ggplot(mpg) +
  geom_smooth(aes(x = displ, y = hwy))

p2 <- ggplot(mpg,aes(x = displ, y = hwy, group = drv)) +
  geom_smooth(se = FALSE)

p3 <- ggplot(mpg) +
  geom_smooth(
    aes(x = displ, y = hwy, color = drv),
    show.legend = FALSE)

p1 + p2 + p3

- 配合filter

ggplot(mpg, aes(x = displ, y = hwy)) + 
  geom_point(aes(color = class)) + 
  geom_smooth(data = filter(mpg, class == "subcompact"), se = FALSE)

- 细节画图

同样是外白内其他颜色的点,一种重叠后有白色,一种无白色在内

p1 <- ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(aes(fill=drv),shape=21,color='white',size=2.5,stroke=1.5)

p2 <- ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(color='white',size=3.5)+
  geom_point(aes(color=drv),shape=16,size=2.3)

p1 + p2

Statistical transformations

barcharts, histograms, and frequency polygons bin your data and then plot bin counts, the number of points that fall in each bin.
smoothers fit a model to your data and then plot predictions from the model.
boxplots compute a robust summary of the distribution and then display a specially formatted box.

- 几种常用互换

You can generally use geoms and stats interchangeably. For example, you can recreate the previous plot using stat_count() instead of geom_bar()

ggplot(data = diamonds) + 
  stat_count(mapping = aes(x = cut))
# 等价于
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut), stat = 'identity') # 默认stat可以不写
ggplot(data = diamonds) +
  geom_pointrange(
    mapping = aes(x = cut, y = depth),
    stat = "summary",
    fun.ymin = min,
    fun.ymax = max,
    fun.y = median
  )
# 等价于
ggplot(data = diamonds) +
  stat_summary(
    mapping = aes(x = cut, y = depth),
    fun.ymin = min,
    fun.ymax = max,
    fun.y = median
  )

# 也可以手动复现
ggplot(diamonds, aes(cut,depth)) + 
  geom_line(size=1) + 
  # 更换data需要重新指名data = xxx
  geom_point(data = diamonds %>%   
               group_by(cut) %>% 
               summarise(median(depth)),
               aes(cut, `median(depth)`), size=2) 

- 覆盖默认映射

ggplot(diamonds) + 
  geom_bar(aes(x = cut, y = stat(prop), group = 1, fill = stat(prop)))
# 等价于
p1 <- ggplot(diamonds) + 
  geom_bar(aes(x = cut, y = ..prop.., group = 1, fill = ..prop..))

p2 <- ggplot(diamonds) + 
  geom_bar(aes(x = cut, y = ..prop.., group = color, fill = color))

p1 + p2

- What does geom_col() do? How is it different to geom_bar()?

  1. geom_col() 函数也是用来绘制柱状图,"identity" 表示不做统计变换
  2. geom_bar() 函数默认是 count,表示计数

- Most geoms and stats come in pairs that are almost always used in concert. Read through the documentation and make a list of all the pairs. What do they have in common?

Position adjustments

position = "identity" 将每个对象直接显示在图中,这样数据会彼此重叠,不适合展示结果
position = "fill" 堆叠百分比条形图
position = "dodge" 并列条形图
position = "stack" 堆叠起来
position = "jitter" 数据随机抖动,一般应用于散点图

用一下刘博的案例

library(ggplot2)
library(patchwork)

v <- data.frame(x = 1:20, 
                y = runif(40,min = 10,max = 20),
                z = rep(c("A","B"),each = 20))
                
p1 <- ggplot(v, aes(x, y, fill = z))+
  geom_area(position = position_dodge(), alpha = 0.5) +
  labs(title = "position_dodge()")

p2 <- ggplot(v, aes(x, y, fill = z))+
  geom_area(position = position_fill(), alpha = 0.5) +
  labs(title = "position_fill()")

p3 <- ggplot(v, aes(x, y, fill = z))+
  geom_area(position = position_stack(), alpha = 0.5) +
  labs(title = "position_stack()")

p4 <- ggplot(v, aes(x, y, fill = z))+
  geom_area(position = position_identity(), alpha = 0.5) +
  labs(title = "position_identity()")

p5 <- ggplot(v, aes(x, y, fill = z))+
  geom_area(position = position_jitter(), alpha = 0.5) +
  labs(title = "position_jitter(), usually for point")

(p1 + p2 + p3)/(p4 + p5) 
  • geom_jitter() 抖动

geom_jitter() 对数据进行随机抖动
geom_count() 将重叠的位置数目进行计数

p1 <- ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_point()

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_jitter()
# 等价于
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_point(position = position_jitter())
# 等价于
p2 <- ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_point(position = 'jitter')

# geom_count()
p3 <- ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_count()

Coordinate systems

- coord_flip()

coord_flip() switches the x and y axes. This is useful (for example), if you want horizontal boxplots. It’s also useful for long labels: it’s hard to get them to fit without overlapping on the x-axis.

p1 <- ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 
  geom_boxplot()

p2 <- ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 
  geom_boxplot() +
  coord_flip()

p1 + p2 

- coord_quickmap()

帮助地图设置成正确比例

coord_quickmap() sets the aspect ratio correctly for maps. This is very important if you’re plotting spatial data with ggplot2.

nz <- map_data("nz")

p1 <- ggplot(nz, aes(long, lat, group = group)) +
  geom_polygon(fill = "white", colour = "black")

p2 <- ggplot(nz, aes(long, lat, group = group)) +
  geom_polygon(fill = "white", colour = "black") +
  coord_quickmap()

p1 + p2 

- coord_polar()

bar <- ggplot(data = diamonds) + 
  geom_bar(
    mapping = aes(x = cut, fill = cut), 
    show.legend = FALSE,
    width = 1
  ) + 
  theme(aspect.ratio = 1) +
  labs(x = NULL, y = NULL)

p1 <- bar + coord_flip()
p2 <- bar + coord_polar()

p1 + p2 

进一步拓展:

- Turn a stacked bar chart into a pie chart using coord_polar()

p1 <- ggplot(diamonds) +
  geom_bar(aes(x = cut, fill = clarity)) + 
  coord_polar()

p2 <- ggplot(diamonds) +
  geom_bar(aes(x = cut, fill = clarity),
           position = 'fill') + 
  coord_polar()

# theta 参数表示 variable to map angle to (x or y)
# 意思就是根据值计算出所占的比例,然后再映射到角度
p3 <- ggplot(diamonds) +
  geom_bar(aes(x = cut, fill = clarity),
           position = 'fill') + 
  coord_polar(theta = "y")

p1 + p2 + p3

- What does the plot below tell you about the relationship between city and highway mpg? Why is coord_fixed() important? What does geom_abline() do?

'''
城市和公路燃油效率之间呈现正相关。
coord_fixed()能够固定x轴和y轴的比例。
geom_abline()是绘制斜线,默认45度,截距适应图形
可以指定intercept截距,slope坡度
'''

p1 <- ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_point() + 
  geom_abline() +
  coord_fixed()

p2 <- ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_point() +
  geom_abline(intercept=-5,slope=1) +
  coord_fixed()

p1 + p2
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 214,658评论 6 496
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,482评论 3 389
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 160,213评论 0 350
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,395评论 1 288
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,487评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,523评论 1 293
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,525评论 3 414
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,300评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,753评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,048评论 2 330
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,223评论 1 343
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,905评论 5 338
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,541评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,168评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,417评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,094评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,088评论 2 352

推荐阅读更多精彩内容