写merge()这个函数呢,是因为它可以像excel里面的vlookup的功能,根据信息在某个数据框或矩阵内查找并获取你想要的信息。
1 先准备数据集
准备authors
的数据框 和 authorN
数据框
authors <- data.frame(
## I(*) : use character columns of names to get sensible sort order
surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
nationality = c("US", "Australia", "US", "UK", "Australia"),
deceased = c("yes", rep("no", 4))) <- data.frame(
## I(*) : use character columns of names to get sensible sort order
surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
nationality = c("US", "Australia", "US", "UK", "Australia"),
deceased = c("yes", rep("no", 4)))
#查看数据框
authors
authorN <- within(authors, { name <- surname; rm(surname) })
authorN
准备
books
数据框
books <- data.frame(
name = I(c("Tukey", "Venables", "Tierney",
"Ripley", "Ripley", "McNeil", "R Core")),
title = c("Exploratory Data Analysis",
"Modern Applied Statistics ...",
"LISP-STAT",
"Spatial Statistics", "Stochastic Simulation",
"Interactive Data Analysis",
"An Introduction to R"),
other.author = c(NA, "Ripley", NA, NA, NA, NA,
"Venables & Smith"))
books
2 直接merge合并两个数据anthorN和books
(m0 <- merge(authorN, books))
结果如下:
3 根据author中的surname这列,和books的name这一列来合并
(m1 <- merge(authors, books, by.x = "surname", by.y = "name"))
结果如下: 其实是以authors为基础,然后合并books数据框的
比如,将两者的位置调换过来,结果会是以books为基础,将authors数据框匹配过来的。
m2 <- merge(books, authors, by.x = "name", by.y = "surname")
m2
4 如果设置all = TRUE
参数, 没有匹配出来的也会合并在一块,但是会用NA来表示。这里也会匹配所有项,返回的也会是两个数据框的所有合并数据。
merge(authors, books, by.x = "surname", by.y = "name", all = TRUE)
这里的结果就会比上面合并的时候多出来一行,第二行的 R Core
,就是因为根据books
数据框中的surname
去匹配authors
数据框中name
,没有R Core
,对应的就会显示NA。 这个有利于我们去匹配一些数据,而不丢失原有的数据框的内容。
5 根据多列来合并
x <- data.frame(k1 = c(NA,NA,3,4,5), k2 = c(1,NA,NA,4,5), data = 1:5)
x
y <- data.frame(k1 = c(NA,2,NA,4,5), k2 = c(NA,NA,3,4,5), data = 1:5)
y
根据各个数据框中两列(k1、k2)来匹配两组数据框
merge(x, y, by = c("k1","k2")) # NA's match
merge(x, y, by = "k1") # NA's match, so 6 rows
6 如果设置incomparables = NA的话,结果中不会出现有NA的数据
merge(x, y, by = "k2", incomparables = NA) # 2 rows