dplyr包学习笔记

我的总结

  1. 按值筛选观测,filter()
  2. 按名称选取变量,select()
  3. 对行进行重新排序,arrange()
  4. 使用现有变量的函数创建新变量,mutate()
  5. 将多个值总结为一个摘要统计量,summarize()
libarry(dplyr)

看下示例数据starwars

starwars

# A tibble: 87 × 14
   name          height  mass hair_color skin_color eye_color birth_year sex   gender homeworld species films vehicles
   <chr>          <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr>  <chr>     <chr>   <lis> <list>  
 1 Luke Skywalk…    172    77 blond      fair       blue            19   male  mascu… Tatooine  Human   <chr> <chr>   
 2 C-3PO            167    75 NA         gold       yellow         112   none  mascu… Tatooine  Droid   <chr> <chr>   
 3 R2-D2             96    32 NA         white, bl… red             33   none  mascu… Naboo     Droid   <chr> <chr>   
 4 Darth Vader      202   136 none       white      yellow          41.9 male  mascu… Tatooine  Human   <chr> <chr>   
 5 Leia Organa      150    49 brown      light      brown           19   fema… femin… Alderaan  Human   <chr> <chr>   
 6 Owen Lars        178   120 brown, gr… light      blue            52   male  mascu… Tatooine  Human   <chr> <chr>   
 7 Beru Whitesu…    165    75 brown      light      blue            47   fema… femin… Tatooine  Human   <chr> <chr>   
 8 R5-D4             97    32 NA         white, red red             NA   none  mascu… Tatooine  Droid   <chr> <chr>   
 9 Biggs Darkli…    183    84 black      light      brown           24   male  mascu… Tatooine  Human   <chr> <chr>   
10 Obi-Wan Keno…    182    77 auburn, w… fair       blue-gray       57   male  mascu… Stewjon   Human   <chr> <chr>   
# ℹ 77 more rows
# ℹ 1 more variable: starships <list>
# ℹ Use `print(n = ...)` to see more rows
  1. filter():筛选行
starwars %>% 
  filter(species == "Droid")

# A tibble: 6 × 14
  name   height  mass hair_color skin_color  eye_color birth_year sex   gender    homeworld species films     vehicles
  <chr>   <int> <dbl> <chr>      <chr>       <chr>          <dbl> <chr> <chr>     <chr>     <chr>   <list>    <list>  
1 C-3PO     167    75 NA         gold        yellow           112 none  masculine Tatooine  Droid   <chr [6]> <chr>   
2 R2-D2      96    32 NA         white, blue red               33 none  masculine Naboo     Droid   <chr [7]> <chr>   
3 R5-D4      97    32 NA         white, red  red               NA none  masculine Tatooine  Droid   <chr [1]> <chr>   
4 IG-88     200   140 none       metal       red               15 none  masculine NA        Droid   <chr [1]> <chr>   
5 R4-P17     96    NA none       silver, red red, blue         NA none  feminine  NA        Droid   <chr [2]> <chr>   
6 BB8        NA    NA none       none        black             NA none  masculine NA        Droid   <chr [1]> <chr>   
# ℹ 1 more variable: starships <list>
  1. select():筛选列
starwars %>% 
  select(name, ends_with("color"))

# A tibble: 87 × 4
   name               hair_color    skin_color  eye_color
   <chr>              <chr>         <chr>       <chr>    
 1 Luke Skywalker     blond         fair        blue     
 2 C-3PO              NA            gold        yellow   
 3 R2-D2              NA            white, blue red      
 4 Darth Vader        none          white       yellow   
 5 Leia Organa        brown         light       brown    
 6 Owen Lars          brown, grey   light       blue     
 7 Beru Whitesun Lars brown         light       blue     
 8 R5-D4              NA            white, red  red      
 9 Biggs Darklighter  black         light       brown    
10 Obi-Wan Kenobi     auburn, white fair        blue-gray
# ℹ 77 more rows
# ℹ Use `print(n = ...)` to see more rows
  1. mutate():在数据框里新增一列或改写一列,原地返回一个“改完以后”的数据框。
starwars %>% 
  mutate(name, bmi = mass / ((height / 100)  ^ 2)) %>%
  select(name:mass, bmi)

# A tibble: 87 × 4
   name               height  mass   bmi
   <chr>               <int> <dbl> <dbl>
 1 Luke Skywalker        172    77  26.0
 2 C-3PO                 167    75  26.9
 3 R2-D2                  96    32  34.7
 4 Darth Vader           202   136  33.3
 5 Leia Organa           150    49  21.8
 6 Owen Lars             178   120  37.9
 7 Beru Whitesun Lars    165    75  27.5
 8 R5-D4                  97    32  34.0
 9 Biggs Darklighter     183    84  25.1
10 Obi-Wan Kenobi        182    77  23.2
# ℹ 77 more rows
# ℹ Use `print(n = ...)` to see more rows
  1. arrange():按给定列(或表达式)把整行重新排序。默认升序;降序用 desc(x)
starwars %>% 
  arrange(desc(mass))

# A tibble: 87 × 14
   name          height  mass hair_color skin_color eye_color birth_year sex   gender homeworld species films vehicles
   <chr>          <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr>  <chr>     <chr>   <lis> <list>  
 1 Jabba Desili…    175  1358 NA         green-tan… orange         600   herm… mascu… Nal Hutta Hutt    <chr> <chr>   
 2 Grievous         216   159 none       brown, wh… green, y…       NA   male  mascu… Kalee     Kaleesh <chr> <chr>   
 3 IG-88            200   140 none       metal      red             15   none  mascu… NA        Droid   <chr> <chr>   
 4 Darth Vader      202   136 none       white      yellow          41.9 male  mascu… Tatooine  Human   <chr> <chr>   
 5 Tarfful          234   136 brown      brown      blue            NA   male  mascu… Kashyyyk  Wookiee <chr> <chr>   
 6 Owen Lars        178   120 brown, gr… light      blue            52   male  mascu… Tatooine  Human   <chr> <chr>   
 7 Bossk            190   113 none       green      red             53   male  mascu… Trandosha Trando… <chr> <chr>   
 8 Chewbacca        228   112 brown      unknown    blue           200   male  mascu… Kashyyyk  Wookiee <chr> <chr>   
 9 Jek Tono Por…    180   110 brown      fair       blue            NA   NA    NA     Bestine … NA      <chr> <chr>   
10 Dexter Jetts…    198   102 none       brown      yellow          NA   male  mascu… Ojom      Besali… <chr> <chr>   
# ℹ 77 more rows
# ℹ 1 more variable: starships <list>
# ℹ Use `print(n = ...)` to see more rows
  1. group_by():把数据框“标记”成若干分组,后续summarise() / mutate() / filter() 等会按组独立计算,而不是整表一把梭。
  2. summarise():“把已经分组(或未分组)的数据框,按组(或整表)压缩成一行汇总结果。”
    每一组(或整表)只会留下一行。
    汇总指标用聚合函数计算:n()、mean()、sum()、max()、first()…
    如果前面用了 group_by(),就每组一行;没分组就整张表一行。
    汇总后自动去掉分组
starwars %>%
  group_by(species) %>%
  summarise(
    n = n(),
    mass = mean(mass, na.rm = TRUE)
  ) %>%
  filter(
    n > 1,
    mass > 50
  )

# A tibble: 9 × 3
  species      n  mass
  <chr>    <int> <dbl>
1 Droid        6  69.8
2 Gungan       3  74  
3 Human       35  81.3
4 Kaminoan     2  88  
5 Mirialan     2  53.1
6 Twi'lek      2  55  
7 Wookiee      2 124  
8 Zabrak       2  80  
9 NA           4  81  

修改列名

  1. 只想“把 A 改成 B”——用rename()
  2. 想“批量改”——用rename_with()

构建一个示例数据

df <- tibble(old1 = 1:3, old2 = letters[1:3], old3 = rnorm(3))

df
# A tibble: 3 × 3
   old1 old2    old3
  <int> <chr>  <dbl>
1     1 a     -0.448
2     2 b     -0.729
3     3 c     -0.976

使用rename()

df2 <- df %>% rename(
  new1 = old1,          # 左边是新名字,右边是旧名字
  新列2  = old2         # 中文也行
)

df2
# A tibble: 3 × 3
   new1 新列2    old3
  <int> <chr>   <dbl>
1     1 a      0.0896
2     2 b     -0.921 
3     3 c     -0.122

使用rename_with()

全体加前缀

df %>% rename_with(~ paste0("pre_", .x))

# A tibble: 3 × 3
  pre_old1 pre_old2 pre_old3
     <int> <chr>       <dbl>
1        1 a          -0.448
2        2 b          -0.729
3        3 c          -0.976

全体加后缀

df %>% rename_with(~ paste0(.x, "_suf"))   

# A tibble: 3 × 3
  old1_suf old2_suf old3_suf
     <int> <chr>       <dbl>
1        1 a          -0.448
2        2 b          -0.729
3        3 c          -0.976

只改第 2、3 列

df %>% rename_with(toupper, .cols = 2:3)

# A tibble: 3 × 3
   old1 OLD2    OLD3
  <int> <chr>  <dbl>
1     1 a     -0.448
2     2 b     -0.729
3     3 c     -0.976

只改以 old 开头的列

df
# A tibble: 3 × 3
   old1 old2    old3
  <int> <chr>  <dbl>
1     1 a     -0.448
2     2 b     -0.729
3     3 c     -0.976

df %>% rename_with(~ sub("^old", "new", .x), starts_with("old"))

# A tibble: 3 × 3
   new1 new2    new3
  <int> <chr>  <dbl>
1     1 a     -0.448
2     2 b     -0.729
3     3 c     -0.976

问题1:我想批量修改指定列

df
# A tibble: 3 × 3
   old1 old2    old3
  <int> <chr>  <dbl>
1     1 a     -0.448
2     2 b     -0.729
3     3 c     -0.976

df %>% rename_with(
  .fn = ~ c("x1", "x2"),
  .cols = 1:2
)
# A tibble: 3 × 3
     x1 x2      old3
  <int> <chr>  <dbl>
1     1 a     -0.448
2     2 b     -0.729
3     3 c     -0.976

附录

代码:
https://github.com/wPencil/MyNotes/blob/8dfb444bfc0542f051f971fb62b4b8078911d36f/R/dplyr_learn_notes.R

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容