用dplyr合并数据
Chapter1. 合并数据表
Chapter2. 向左,向右合并
Chapter3. 完全合并,半完全合并
Chapter4. 问题实践
Chapter1. 合并数据表
有两组数据,一组叫parts
,一组叫part_categories
>
head(parts)
# A tibble: 6 x 3
part_num name part_cat_id
<chr> <chr> <dbl>
1 0901 Baseplate 16 x 30 with Set 080 Yellow House Print 1
2 0902 Baseplate 16 x 24 with Set 080 Small White House Print 1
3 0903 Baseplate 16 x 24 with Set 080 Red House Print 1
4 0904 Baseplate 16 x 24 with Set 080 Large White House Print 1
5 1 Homemaker Bookcase 2 x 4 x 4 7
6 10016414 Sticker Sheet #1 for 41055-1
>
head(part_categories)
# A tibble: 6 x 2
id name
<dbl> <chr>
1 1 Baseplates
2 3 Bricks Sloped
3 4 Duplo, Quatro and Primo
4 5 Bricks Special
5 6 Bricks Wedged
6 7 Containers
要通过part_cat_id
和id
对两个数组进行合并。
parts %>%
inner_join(part_categories,by=c("part_cat_id"="id"))
# A tibble: 17,501 x 4
part_num name.x part_cat_id name.y
<chr> <chr> <dbl> <chr>
1 0901 Baseplate 16 x 30 with Set 080 Yellow ~ 1 Baseplates
2 0902 Baseplate 16 x 24 with Set 080 Small W~ 1 Baseplates
3 0903 Baseplate 16 x 24 with Set 080 Red Hou~ 1 Baseplates
4 0904 Baseplate 16 x 24 with Set 080 Large W~ 1 Baseplates
5 1 Homemaker Bookcase 2 x 4 x 4 7 Containers
6 10016414 Sticker Sheet #1 for 41055-1 58 Stickers
7 10026stk01 Sticker for Set 10026 - (44942/4184185) 58 Stickers
8 10039 Pullback Motor 8 x 4 x 2/3 44 Mechanical
9 10048 Minifig Hair Tousled 65 Minifig Headw~
10 10049 Minifig Shield Broad with Spiked Botto~ 27 Minifig Acces~
# ... with 17,491 more rows
发现一个问题,就是有两个变量重名了,默认变成了name.x
和name.y
。很明显看着不爽,要改成别的名字。可以用suffix
来给变量名添加后缀。
parts %>%
inner_join(part_categories, by = c("part_cat_id" = "id"), suffix=c("_part","_category"))
# A tibble: 17,501 x 4
part_num name_part part_cat_id name_category
<chr> <chr> <dbl> <chr>
1 0901 Baseplate 16 x 30 with Set 080 Yello~ 1 Baseplates
2 0902 Baseplate 16 x 24 with Set 080 Small~ 1 Baseplates
3 0903 Baseplate 16 x 24 with Set 080 Red H~ 1 Baseplates
4 0904 Baseplate 16 x 24 with Set 080 Large~ 1 Baseplates
5 1 Homemaker Bookcase 2 x 4 x 4 7 Containers
6 10016414 Sticker Sheet #1 for 41055-1 58 Stickers
7 10026stk01 Sticker for Set 10026 - (44942/41841~ 58 Stickers
8 10039 Pullback Motor 8 x 4 x 2/3 44 Mechanical
9 10048 Minifig Hair Tousled 65 Minifig Headwear
10 10049 Minifig Shield Broad with Spiked Bot~ 27 Minifig Accesso~
# ... with 17,491 more rows
当然同样的方法还可以用来合并多个数据。
sets %>%
# Add inventories using an inner join
inner_join(inventories,by="set_num") %>%
# Add inventory_parts using an inner join
inner_join(inventory_parts,by=c("id"="inventory_id"))
# A tibble: 258,958 x 9
set_num name year theme_id id version part_num color_id quantity
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
1 700.3-1 Medium Gift ~ 1949 365 24197 1 bdoor01 2 2
2 700.3-1 Medium Gift ~ 1949 365 24197 1 bdoor01 15 1
3 700.3-1 Medium Gift ~ 1949 365 24197 1 bdoor01 4 1
4 700.3-1 Medium Gift ~ 1949 365 24197 1 bslot02 15 6
5 700.3-1 Medium Gift ~ 1949 365 24197 1 bslot02 2 6
6 700.3-1 Medium Gift ~ 1949 365 24197 1 bslot02 4 6
7 700.3-1 Medium Gift ~ 1949 365 24197 1 bslot02 1 6
8 700.3-1 Medium Gift ~ 1949 365 24197 1 bslot02 14 6
9 700.3-1 Medium Gift ~ 1949 365 24197 1 bslot02a 15 6
10 700.3-1 Medium Gift ~ 1949 365 24197 1 bslot02a 2 6
# ... with 258,948 more rows