上次跟大家一起学习了dplyr包的基本语法,大家已经对dplyr包有了初步的了解,接下来我们继续学习该包其他的函数。
安装并加载包
#install.packages("dplyr")
library(dplyr)
## ## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## ## filter, lag
## The following objects are masked from 'package:base':
## ## intersect, setdiff, setequal, union
二.分组动作 group_by()
by_cyl=summarise(
disp = mean(disp),
hp = mean(hp),group_by(mtcars,cyl))#按照cyl分组计算变量disp和hp的平均值
三. 连接符 %>%
使用时把数据名作为开头, 然后依次对此数据进行多步操作.
iris %>%
mutate(Sepal.Length10 = Sepal.Length*10) %>%
select(Sepal.Length10, Sepal.Length) %>% slice(1:15)#在iris中添加一个变量Sepal.Length10,选出变量名为Sepal.Length10和Sepal.Length的列,再选出这两列的前15行
## Sepal.Length10 Sepal.Length
## 1 51 5.1
## 2 49 4.9
## 3 47 4.7
## 4 46 4.6
## 5 50 5.0
## 6 54 5.4
## 7 46 4.6
## 8 50 5.0
## 9 44 4.4
## 10 49 4.9
## 11 54 5.4
## 12 48 4.8
## 13 48 4.8
## 14 43 4.3## 15 58 5.8
四. 数据关联:join
注意:要连结的数据集中不要引入factor, 可令stringsAsFactors = F
以R自带的band_members和band_instruments为例,先查看数据
View(band_members)
View(band_instruments)
1.內连inner_join,取交集
band_members %>% inner_join(band_instruments, by = "name")
## # A tibble: 2 x 3
## name band plays
## <chr> <chr> <chr>
## 1 John Beatles guitar## 2 Paul Beatles bass
上一行命令等同于下面这行命令
inner_join(band_members,band_instruments,by = "name")
## # A tibble: 2 x 3
## name band plays
## <chr> <chr> <chr>
## 1 John Beatles guitar## 2 Paul Beatles bass
2.左连left_join
band_members %>% left_join(band_instruments,by = "name")
## # A tibble: 3 x 3
## name band plays
## <chr> <chr> <chr>
## 1 Mick Stones <NA>
## 2 John Beatles guitar## 3 Paul Beatles bass
3.右连接
band_members %>% right_join(band_instruments,by = "name")
## # A tibble: 3 x 3
## name band plays
## <chr> <chr> <chr>
## 1 John Beatles guitar
## 2 Paul Beatles bass ## 3 Keith <NA> guitar
4.全连full_join
band_members %>% full_join(band_instruments,by = "name")
## # A tibble: 4 x 3
## name band plays
## <chr> <chr> <chr>
## 1 Mick Stones <NA>
## 2 John Beatles guitar
## 3 Paul Beatles bass ## 4 Keith <NA> guitar
五. 数据合并: bind
这里的bind函数相当于base包里的cbind()函数和rbind()函数
这里所使用的是R自带的数据集starwars
1.创建数据库
one <- starwars[1:4, ]
two <- starwars[9:12, ]
View(one)
View(two)
2.行合并
bind_rows(list(one, two))
## # A tibble: 8 x 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Luke Sk~ 172 77 blond fair blue 19 male mascu~
## 2 C-3PO 167 75 <NA> gold yellow 112 none mascu~
## 3 R2-D2 96 32 <NA> white, bl~ red 33 none mascu~
## 4 Darth V~ 202 136 none white yellow 41.9 male mascu~
## 5 Biggs D~ 183 84 black light brown 24 male mascu~
## 6 Obi-Wan~ 182 77 auburn, wh~ fair blue-gray 57 male mascu~
## 7 Anakin ~ 188 84 blond fair blue 41.9 male mascu~
## 8 Wilhuff~ 180 NA auburn, gr~ fair blue 64 male mascu~
## # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>,## # vehicles <list>, starships <list>
其他例子的行合并
bind_rows(
c(a = 1, b = 2),
tibble(a = 3:4, b = 5:6),
c(a = 7, b = 8))
## # A tibble: 4 x 2
## a b
## <dbl> <dbl>
## 1 1 2
## 2 3 5
## 3 4 6## 4 7 8
3.列合并
bind_cols(one, two)#等于:bind_cols(list(one, two))
## New names:
## * name -> name...1
## * height -> height...2
## * mass -> mass...3
## * hair_color -> hair_color...4
## * skin_color -> skin_color...5## * ...
## # A tibble: 4 x 28
## name...1 height...2 mass...3 hair_color...4 skin_color...5 eye_color...6
## <chr> <int> <dbl> <chr> <chr> <chr>
## 1 Luke Skywalker 172 77 blond fair blue
## 2 C-3PO 167 75 <NA> gold yellow
## 3 R2-D2 96 32 <NA> white, blue red
## 4 Darth Vader 202 136 none white yellow
## # ... with 22 more variables: birth_year...7 <dbl>, sex...8 <chr>,
## # gender...9 <chr>, homeworld...10 <chr>, species...11 <chr>,
## # films...12 <list>, vehicles...13 <list>, starships...14 <list>,
## # name...15 <chr>, height...16 <int>, mass...17 <dbl>, hair_color...18 <chr>,
## # skin_color...19 <chr>, eye_color...20 <chr>, birth_year...21 <dbl>,
## # sex...22 <chr>, gender...23 <chr>, homeworld...24 <chr>,
## # species...25 <chr>, films...26 <list>, vehicles...27 <list>,## # starships...28 <list>
六. 集合操作: set
intersect(band_members[,1],band_instruments[,1])#取两个集合的交集
## # A tibble: 2 x 1
## name
## <chr>
## 1 John ## 2 Paul
union(band_members[,1],band_instruments[,1])#取两个集合的并集,并进行去重
## # A tibble: 4 x 1
## name
## <chr>
## 1 Mick
## 2 John
## 3 Paul ## 4 Keith
union_all(band_members[,1],band_instruments[,1])#取两个集合的并集,不去重
## # A tibble: 6 x 1
## name
## <chr>
## 1 Mick
## 2 John
## 3 Paul
## 4 John
## 5 Paul ## 6 Keith
setdiff(band_members[,1],band_instruments[,1])#取两个集合的差集
## # A tibble: 1 x 1
## name
## <chr>## 1 Mick
setequal(band_members[,1],band_instruments[,1])#判断两个集合是否相等
## [1] FALSE
其他函数:count统计某列的unique值
count(iris,Species)
## Species n
## 1 setosa 50
## 2 versicolor 50## 3 virginica 50
欢迎大家关注我们的公众号:R语言与SPSS学习笔记
分享实用的SPSS及R处理数据、分析数据及做图的使用技巧