R语言学习笔记总结

R语言初步-用dplyr进行数据转换

install.packages("tidyverse")
install.packages("nycflights13")#仍然记得要先安装
library(nycflights13)#航班信息文件
library(tidyverse)

?flights#查看数据信息的说明书
flights#查看航班信息

3.使用select()函数选择列&用rename()函数重命名列

select(数据框名称，筛选的列)

3.1、按名称选择列

select(flights,year,month,day)
#运行后：
# A tibble: 336,776 x 3
    year month   day
   <int> <int> <int>
 1  2013     1     1
 2  2013     1     1
 3  2013     1     1
 4  2013     1     1
 5  2013     1     1
 6  2013     1     1
 7  2013     1     1
 8  2013     1     1
 9  2013     1     1
10  2013     1     1
# ... with 336,766 more rows

3.2、选择两者之间的所有列

用A:B的格式选择 AB两者之间，包括AB的列。

select(flights,year:day)
#运行后：
# A tibble: 336,776 x 3
    year month   day
   <int> <int> <int>
 1  2013     1     1
 2  2013     1     1
 3  2013     1     1
 4  2013     1     1
 5  2013     1     1
 6  2013     1     1
 7  2013     1     1
 8  2013     1     1
 9  2013     1     1
10  2013     1     1
# ... with 336,766 more rows

3.3、选择两者之外的所有列

用-(A:B)的格式选择 AB两者之外，不包括AB的列

select(flights,-(year:day))
#运行后：
# A tibble: 336,776 x 16
   dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay
      <int>          <int>     <dbl>    <int>          <int>     <dbl>
 1      517            515         2      830            819        11
 2      533            529         4      850            830        20
 3      542            540         2      923            850        33
 4      544            545        -1     1004           1022       -18
 5      554            600        -6      812            837       -25
 6      554            558        -4      740            728        12
 7      555            600        -5      913            854        19
 8      557            600        -3      709            723       -14
 9      557            600        -3      838            846        -8
10      558            600        -2      753            745         8
# ... with 336,766 more rows, and 10 more variables: carrier <chr>,
#   flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
#   distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>

3.3、和select()搭配使用的辅助函数

start_with("xxx") 匹配出列名称开头是xxx的列
ends_with("xxx") 匹配出列名称末尾是xxx的列
contains("xxx") 匹配出列中包含xxx的列
matches(""(.)\1")") 匹配出列名称中有重复字符的变量
num_range("xxx")
everything() 将选中的列移动到数据框开头几列

3.3.1 start_with()

匹配出列名称开头是arr的列

select(flights,starts_with("arr"))
#运行后 ：
# A tibble: 336,776 x 2
   arr_time arr_delay
      <int>     <dbl>
 1      830        11
 2      850        20
 3      923        33
 4     1004       -18
 5      812       -25
 6      740        12
 7      913        19
 8      709       -14
 9      838        -8
10      753         8
# ... with 336,766 more rows

3.3.2 ends_with()

匹配出列名称末尾是time的列

select(flights,ends_with("time"))
#运行后 ：
# A tibble: 336,776 x 5
   dep_time sched_dep_time arr_time sched_arr_time air_time
      <int>          <int>    <int>          <int>    <dbl>
 1      517            515      830            819      227
 2      533            529      850            830      227
 3      542            540      923            850      160
 4      544            545     1004           1022      183
 5      554            600      812            837      116
 6      554            558      740            728      150
 7      555            600      913            854      158
 8      557            600      709            723       53
 9      557            600      838            846      140
10      558            600      753            745      138
# ... with 336,766 more rows

3.3.3 contains()

匹配出列中包含dep的列

select(flights,contains("dep"))
#运行后 ：
# A tibble: 336,776 x 3
   dep_time sched_dep_time dep_delay
      <int>          <int>     <dbl>
 1      517            515         2
 2      533            529         4
 3      542            540         2
 4      544            545        -1
 5      554            600        -6
 6      554            558        -4
 7      555            600        -5
 8      557            600        -3
 9      557            600        -3
10      558            600        -2
# ... with 336,766 more rows

3.3.4 matches()

匹配出列名称中有重复字符的变量
涉及正则表达式的使用，待补充~❀

select(flights,matches("(.)\\1"))
#运行后 ：
# A tibble: 336,776 x 4
   arr_time sched_arr_time arr_delay carrier
      <int>          <int>     <dbl> <chr>  
 1      830            819        11 UA     
 2      850            830        20 UA     
 3      923            850        33 AA     
 4     1004           1022       -18 B6     
 5      812            837       -25 DL     
 6      740            728        12 UA     
 7      913            854        19 B6     
 8      709            723       -14 EV     
 9      838            846        -8 B6     
10      753            745         8 AA     
# ... with 336,766 more rows

3.3.5 everything()

将选中的列移动到数据框开头几列

select(flights,dep_time,arr_time,day,month,year,everything())
#运行后：
# A tibble: 336,776 x 19
   dep_time arr_time   day month  year sched_dep_time dep_delay
      <int>    <int> <int> <int> <int>          <int>     <dbl>
 1      517      830     1     1  2013            515         2
 2      533      850     1     1  2013            529         4
 3      542      923     1     1  2013            540         2
 4      544     1004     1     1  2013            545        -1
 5      554      812     1     1  2013            600        -6
 6      554      740     1     1  2013            558        -4
 7      555      913     1     1  2013            600        -5
 8      557      709     1     1  2013            600        -3
 9      557      838     1     1  2013            600        -3
10      558      753     1     1  2013            600        -2
# ... with 336,766 more rows, and 12 more variables: sched_arr_time <int>,
#   arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
#   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#   minute <dbl>, time_hour <dttm>

3.3.5 num_range()

本文使用的数据表不适用此函数，暂不举例
大致用法：num_range("x",1），可以匹配x1、x2和x3

补充：用rename()函数重命名列

select()函数也可以重命名列，但是这样做是不推荐的，因为select改变了原始数据。所以应该选择使用rename()函数

用法：rename(数据框名称，重命名=原名)
注意：两个名字不要写反，否则无法识别。
举例：将year重命名为y，代码如下。

rename(flights,y = year)
#运行后：
# A tibble: 336,776 x 19
       y month   day dep_time sched_dep_time dep_delay arr_time
   <int> <int> <int>    <int>          <int>     <dbl>    <int>
 1  2013     1     1      517            515         2      830
 2  2013     1     1      533            529         4      850
 3  2013     1     1      542            540         2      923
 4  2013     1     1      544            545        -1     1004
 5  2013     1     1      554            600        -6      812
 6  2013     1     1      554            558        -4      740
 7  2013     1     1      555            600        -5      913
 8  2013     1     1      557            600        -3      709
 9  2013     1     1      557            600        -3      838
10  2013     1     1      558            600        -2      753
# ... with 336,766 more rows, and 12 more variables: sched_arr_time <int>,
#   arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
#   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#   minute <dbl>, time_hour <dttm>

R语言初步-数据转换-3.select()函数和rename()函数

R语言初步-数据转换-3.select()函数和rename()函数

R语言学习笔记总结

R语言初步-用dplyr进行数据转换

3.使用select()函数选择列&用rename()函数重命名列

3.1、按名称选择列

3.2、选择两者之间的所有列

3.3、选择两者之外的所有列

3.3、和select()搭配使用的辅助函数

3.3.1 start_with()

3.3.2 ends_with()

3.3.3 contains()

3.3.4 matches()

3.3.5 everything()

3.3.5 num_range()

补充：用rename()函数重命名列

相关阅读更多精彩内容

友情链接更多精彩内容