Home > Enterprise >  Trying to match strings from multiple columns and create pair list where matches are found
Trying to match strings from multiple columns and create pair list where matches are found

Time:01-20

I have a two data frames with string values:

df1 <- data.frame(values = c("apples_x", "oranges_z", "bananas_y", "berries_u", "melons_r"))

df2 = data.frame(values  = c('apples','oranges','z','pears','x','bananas','plums','y','h','grapes','q')) 

What I would like to do is perform a pairwise comparison between both data frames, by iterating over every row of data frame 2 and assigning pair numbers where both fruit and number of df1 value appears in df2. I want to create a new data frame that stores the pair numbers for the matches found. Ideally it would look something like this:

df3 %>% head()
  values  paired
  <ch>     <int>
1 apples      1         
2 x           1          
3 oranges     2          
4 z           2          
5 bananas     3          
6 y           3 

I tried to separate the values in df1 into two strings, but I am getting back strings with matches on any character.

lapply(df2, FUN=function(x){any(df1==x[[1]] & df1==x[[2]])})

CodePudding user response:

Based on the update, we may filter after splitting the column in 'df1', then create a sequence index and reshape to 'long' format

library(dplyr)
library(tidyr)
df1  %>% 
  separate(values, into = c('values1', 'values2')) %>% 
  filter(if_all(everything(), ~ .x %in% df2$values)) %>%
  mutate(paired = row_number()) %>% 
  pivot_longer(cols = -paired, values_to = 'value', names_to = NULL) %>%
  select(value, paired)

-output

# A tibble: 6 × 2
  value   paired
  <chr>    <int>
1 apples       1
2 x            1
3 oranges      2
4 z            2
5 bananas      3
6 y            3
  •  Tags:  
  • Related