Below is the sample code. Seems simple enough but what I find online makes this far too complex. I am simply wanting to create a third data frame that contains one column.. the areas that common to firstdf and seconddf.
area1 <- c("001","005","007","009","011","013","015","017","019","021","023","025")
Employment1 <- c(2,4,6,8,110,12,14,15,16,17,12,20)
firstdf <- data.frame(area1,Employment1)
area2 <- c("005","007","011","013","015","021","027","033")
Employment2 <- c(100,101,102,103,104,111,321,522)
seconddf <- data.frame(area2,Employment2)
intended result
thirddf =>
area
005
007
011
013
015
021
CodePudding user response:
We can use intersect:
data.frame(area = intersect(firstdf$area1, seconddf$area2))
# area
# 1 005
# 2 007
# 3 011
# 4 013
# 5 015
# 6 021
though there's no need for the frame, it can easily be intersect(..) without the wrapping data.frame(.).
Candidly, though, it is common to want to preserve the other columns from either or both frames. Some thoughts:
### rows where `area` is in both
merge(firstdf, seconddf, by.x = "area1", by.y = "area2")
# area1 Employment1 Employment2
# 1 005 4 100
# 2 007 6 101
# 3 011 110 102
# 4 013 12 103
# 5 015 14 104
# 6 021 17 111
### always keep first frame
merge(firstdf, seconddf, by.x = "area1", by.y = "area2", all.x = TRUE)
# area1 Employment1 Employment2
# 1 001 2 NA
# 2 005 4 100
# 3 007 6 101
# 4 009 8 NA
# 5 011 110 102
# 6 013 12 103
# 7 015 14 104
# 8 017 15 NA
# 9 019 16 NA
# 10 021 17 111
# 11 023 12 NA
# 12 025 20 NA
### always keep second frame
merge(firstdf, seconddf, by.x = "area1", by.y = "area2", all.y = TRUE)
# area1 Employment1 Employment2
# 1 005 4 100
# 2 007 6 101
# 3 011 110 102
# 4 013 12 103
# 5 015 14 104
# 6 021 17 111
# 7 027 NA 321
# 8 033 NA 522
### keep all rows, NAs abound
merge(firstdf, seconddf, by.x = "area1", by.y = "area2", all = TRUE)
# area1 Employment1 Employment2
# 1 001 2 NA
# 2 005 4 100
# 3 007 6 101
# 4 009 8 NA
# 5 011 110 102
# 6 013 12 103
# 7 015 14 104
# 8 017 15 NA
# 9 019 16 NA
# 10 021 17 111
# 11 023 12 NA
# 12 025 20 NA
# 13 027 NA 321
# 14 033 NA 522
There are dplyr-variants of the merge(..) calls above that might be easier to read if you're already learning/using that dialect. The joining is over-kill if you never want all of the other columns, but I offer it in case that's the direction you're headed.
CodePudding user response:
library(dplyr)
tibble(intersect(firstdf$area1, seconddf$area2))
1 005
2 007
3 011
4 013
5 015
6 021
