
我的总结:
- 按值筛选观测,
filter()。 - 按名称选取变量,
select()。 - 对行进行重新排序,
arrange()。 - 使用现有变量的函数创建新变量,
mutate()。 - 将多个值总结为一个摘要统计量,
summarize()。
libarry(dplyr)
看下示例数据starwars
starwars
# A tibble: 87 × 14
name height mass hair_color skin_color eye_color birth_year sex gender homeworld species films vehicles
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <lis> <list>
1 Luke Skywalk… 172 77 blond fair blue 19 male mascu… Tatooine Human <chr> <chr>
2 C-3PO 167 75 NA gold yellow 112 none mascu… Tatooine Droid <chr> <chr>
3 R2-D2 96 32 NA white, bl… red 33 none mascu… Naboo Droid <chr> <chr>
4 Darth Vader 202 136 none white yellow 41.9 male mascu… Tatooine Human <chr> <chr>
5 Leia Organa 150 49 brown light brown 19 fema… femin… Alderaan Human <chr> <chr>
6 Owen Lars 178 120 brown, gr… light blue 52 male mascu… Tatooine Human <chr> <chr>
7 Beru Whitesu… 165 75 brown light blue 47 fema… femin… Tatooine Human <chr> <chr>
8 R5-D4 97 32 NA white, red red NA none mascu… Tatooine Droid <chr> <chr>
9 Biggs Darkli… 183 84 black light brown 24 male mascu… Tatooine Human <chr> <chr>
10 Obi-Wan Keno… 182 77 auburn, w… fair blue-gray 57 male mascu… Stewjon Human <chr> <chr>
# ℹ 77 more rows
# ℹ 1 more variable: starships <list>
# ℹ Use `print(n = ...)` to see more rows
-
filter():筛选行
starwars %>%
filter(species == "Droid")
# A tibble: 6 × 14
name height mass hair_color skin_color eye_color birth_year sex gender homeworld species films vehicles
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <list> <list>
1 C-3PO 167 75 NA gold yellow 112 none masculine Tatooine Droid <chr [6]> <chr>
2 R2-D2 96 32 NA white, blue red 33 none masculine Naboo Droid <chr [7]> <chr>
3 R5-D4 97 32 NA white, red red NA none masculine Tatooine Droid <chr [1]> <chr>
4 IG-88 200 140 none metal red 15 none masculine NA Droid <chr [1]> <chr>
5 R4-P17 96 NA none silver, red red, blue NA none feminine NA Droid <chr [2]> <chr>
6 BB8 NA NA none none black NA none masculine NA Droid <chr [1]> <chr>
# ℹ 1 more variable: starships <list>
-
select():筛选列
starwars %>%
select(name, ends_with("color"))
# A tibble: 87 × 4
name hair_color skin_color eye_color
<chr> <chr> <chr> <chr>
1 Luke Skywalker blond fair blue
2 C-3PO NA gold yellow
3 R2-D2 NA white, blue red
4 Darth Vader none white yellow
5 Leia Organa brown light brown
6 Owen Lars brown, grey light blue
7 Beru Whitesun Lars brown light blue
8 R5-D4 NA white, red red
9 Biggs Darklighter black light brown
10 Obi-Wan Kenobi auburn, white fair blue-gray
# ℹ 77 more rows
# ℹ Use `print(n = ...)` to see more rows
-
mutate():在数据框里新增一列或改写一列,原地返回一个“改完以后”的数据框。
starwars %>%
mutate(name, bmi = mass / ((height / 100) ^ 2)) %>%
select(name:mass, bmi)
# A tibble: 87 × 4
name height mass bmi
<chr> <int> <dbl> <dbl>
1 Luke Skywalker 172 77 26.0
2 C-3PO 167 75 26.9
3 R2-D2 96 32 34.7
4 Darth Vader 202 136 33.3
5 Leia Organa 150 49 21.8
6 Owen Lars 178 120 37.9
7 Beru Whitesun Lars 165 75 27.5
8 R5-D4 97 32 34.0
9 Biggs Darklighter 183 84 25.1
10 Obi-Wan Kenobi 182 77 23.2
# ℹ 77 more rows
# ℹ Use `print(n = ...)` to see more rows
-
arrange():按给定列(或表达式)把整行重新排序。默认升序;降序用 desc(x)
starwars %>%
arrange(desc(mass))
# A tibble: 87 × 14
name height mass hair_color skin_color eye_color birth_year sex gender homeworld species films vehicles
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <lis> <list>
1 Jabba Desili… 175 1358 NA green-tan… orange 600 herm… mascu… Nal Hutta Hutt <chr> <chr>
2 Grievous 216 159 none brown, wh… green, y… NA male mascu… Kalee Kaleesh <chr> <chr>
3 IG-88 200 140 none metal red 15 none mascu… NA Droid <chr> <chr>
4 Darth Vader 202 136 none white yellow 41.9 male mascu… Tatooine Human <chr> <chr>
5 Tarfful 234 136 brown brown blue NA male mascu… Kashyyyk Wookiee <chr> <chr>
6 Owen Lars 178 120 brown, gr… light blue 52 male mascu… Tatooine Human <chr> <chr>
7 Bossk 190 113 none green red 53 male mascu… Trandosha Trando… <chr> <chr>
8 Chewbacca 228 112 brown unknown blue 200 male mascu… Kashyyyk Wookiee <chr> <chr>
9 Jek Tono Por… 180 110 brown fair blue NA NA NA Bestine … NA <chr> <chr>
10 Dexter Jetts… 198 102 none brown yellow NA male mascu… Ojom Besali… <chr> <chr>
# ℹ 77 more rows
# ℹ 1 more variable: starships <list>
# ℹ Use `print(n = ...)` to see more rows
-
group_by():把数据框“标记”成若干分组,后续summarise()/mutate()/filter()等会按组独立计算,而不是整表一把梭。 -
summarise():“把已经分组(或未分组)的数据框,按组(或整表)压缩成一行汇总结果。”
每一组(或整表)只会留下一行。
汇总指标用聚合函数计算:n()、mean()、sum()、max()、first()…
如果前面用了 group_by(),就每组一行;没分组就整张表一行。
汇总后自动去掉分组
starwars %>%
group_by(species) %>%
summarise(
n = n(),
mass = mean(mass, na.rm = TRUE)
) %>%
filter(
n > 1,
mass > 50
)
# A tibble: 9 × 3
species n mass
<chr> <int> <dbl>
1 Droid 6 69.8
2 Gungan 3 74
3 Human 35 81.3
4 Kaminoan 2 88
5 Mirialan 2 53.1
6 Twi'lek 2 55
7 Wookiee 2 124
8 Zabrak 2 80
9 NA 4 81
修改列名
- 只想“把 A 改成 B”——用
rename() - 想“批量改”——用
rename_with()
构建一个示例数据
df <- tibble(old1 = 1:3, old2 = letters[1:3], old3 = rnorm(3))
df
# A tibble: 3 × 3
old1 old2 old3
<int> <chr> <dbl>
1 1 a -0.448
2 2 b -0.729
3 3 c -0.976
使用rename()
df2 <- df %>% rename(
new1 = old1, # 左边是新名字,右边是旧名字
新列2 = old2 # 中文也行
)
df2
# A tibble: 3 × 3
new1 新列2 old3
<int> <chr> <dbl>
1 1 a 0.0896
2 2 b -0.921
3 3 c -0.122
使用rename_with()
全体加前缀
df %>% rename_with(~ paste0("pre_", .x))
# A tibble: 3 × 3
pre_old1 pre_old2 pre_old3
<int> <chr> <dbl>
1 1 a -0.448
2 2 b -0.729
3 3 c -0.976
全体加后缀
df %>% rename_with(~ paste0(.x, "_suf"))
# A tibble: 3 × 3
old1_suf old2_suf old3_suf
<int> <chr> <dbl>
1 1 a -0.448
2 2 b -0.729
3 3 c -0.976
只改第 2、3 列
df %>% rename_with(toupper, .cols = 2:3)
# A tibble: 3 × 3
old1 OLD2 OLD3
<int> <chr> <dbl>
1 1 a -0.448
2 2 b -0.729
3 3 c -0.976
只改以 old 开头的列
df
# A tibble: 3 × 3
old1 old2 old3
<int> <chr> <dbl>
1 1 a -0.448
2 2 b -0.729
3 3 c -0.976
df %>% rename_with(~ sub("^old", "new", .x), starts_with("old"))
# A tibble: 3 × 3
new1 new2 new3
<int> <chr> <dbl>
1 1 a -0.448
2 2 b -0.729
3 3 c -0.976
问题1:我想批量修改指定列
df
# A tibble: 3 × 3
old1 old2 old3
<int> <chr> <dbl>
1 1 a -0.448
2 2 b -0.729
3 3 c -0.976
df %>% rename_with(
.fn = ~ c("x1", "x2"),
.cols = 1:2
)
# A tibble: 3 × 3
x1 x2 old3
<int> <chr> <dbl>
1 1 a -0.448
2 2 b -0.729
3 3 c -0.976