Count word matches between two variables-CodePudding

Assume two datasets A and B:

X1<- c('a', 'b','c')
place<-c('andes','brooklyn', 'comorin')
A<-data.frame(X1,place)

X2<-c('a','a','a','b','c','c','d')
place2<-c('andes','alamo','andes','brooklyn','comorin','camden','dover')
B<-data.frame(X2,place2)

I want to count how many times each word in A$place occurs in B$place2.

CodePudding user response：

Use str_detect from the stringr package.

library(stringr)
sapply(A$place, function(x) sum(str_detect(x, B$place2)))

andes brooklyn  comorin 
       2        1        1

CodePudding user response：

A possible solution:

library(tidyverse)

A %>% 
  rowwise %>% 
  mutate(n = sum(place == B$place2)) %>% 
  ungroup

#> # A tibble: 3 × 3
#>   X1    place        n
#>   <chr> <chr>    <int>
#> 1 a     andes        2
#> 2 b     brooklyn     1
#> 3 c     comorin      1

CodePudding user response：

table(B$place2[B$place2 %in% A$place])

# andes brooklyn  comorin 
#     2        1        1

CodePudding user response：

Here's a base R version of user438383's answer.

sapply(A$place, function(y) sum(grepl(y, B$place2)))  

   andes brooklyn  comorin 
       2        1        1

The key functionality is sapply() which repeats an operation over all elements of a vector, grepl() which checks the matches and generates TRUE or FALSE, and sum(). When you sum a logical vector, you get the count of TRUE.