I have a two data frames with string values:
df1 <- data.frame(values = c("apples_x", "oranges_z", "bananas_y", "berries_u", "melons_r"))
df2 = data.frame(values = c('apples','oranges','z','pears','x','bananas','plums','y','h','grapes','q'))
What I would like to do is perform a pairwise comparison between both data frames, by iterating over every row of data frame 2 and assigning pair numbers where both fruit and number of df1 value appears in df2. I want to create a new data frame that stores the pair numbers for the matches found. Ideally it would look something like this:
df3 %>% head()
values paired
<ch> <int>
1 apples 1
2 x 1
3 oranges 2
4 z 2
5 bananas 3
6 y 3
I tried to separate the values in df1 into two strings, but I am getting back strings with matches on any character.
lapply(df2, FUN=function(x){any(df1==x[[1]] & df1==x[[2]])})
CodePudding user response:
Based on the update, we may filter after splitting the column in 'df1', then create a sequence index and reshape to 'long' format
library(dplyr)
library(tidyr)
df1 %>%
separate(values, into = c('values1', 'values2')) %>%
filter(if_all(everything(), ~ .x %in% df2$values)) %>%
mutate(paired = row_number()) %>%
pivot_longer(cols = -paired, values_to = 'value', names_to = NULL) %>%
select(value, paired)
-output
# A tibble: 6 × 2
value paired
<chr> <int>
1 apples 1
2 x 1
3 oranges 2
4 z 2
5 bananas 3
6 y 3
