Home > Blockchain >  How can I iterate grepl over a vector of terms and a vector of text?
How can I iterate grepl over a vector of terms and a vector of text?

Time:02-04

I have a vector of terms and some text

terms <- c("this","that","those")
text <- c("this is some text","here is more text with those words","more text than that other one","this ends it")

I would like to use grepl to search for all terms in all text and return the T/F by term of by text. I tried

sapply(text,grepl,pattern = terms)

but this only returned the answer for the first term

sapply(text,grepl,terms)

this worked but did not give the correct answers (every term returned as False (not appearing in any text).

sapply(text,grepl,sapply(terms,'['))

this also did not work and returned incorrect answers (all false)

CodePudding user response:

You were close. When you sapply, the pattern= must be length 1, so you should add x=text in your sapply:

out <- sapply(terms, grepl, x = text)
out
#       this  that those
# [1,]  TRUE FALSE FALSE
# [2,] FALSE FALSE  TRUE
# [3,] FALSE  TRUE FALSE
# [4,]  TRUE FALSE FALSE

If you need to know if anything matches, you can use rowSums or colSums:

colSums(out) > 0
#  this  that those 
#  TRUE  TRUE  TRUE 
setNames(rowSums(out) > 0, nm = text)
#                  this is some text here is more text with those words      more text than that other one 
#                               TRUE                               TRUE                               TRUE 
#                       this ends it 
#                               TRUE 

(The setNames was purely to identify which logical is for which text, in case that's the direction you wanted to go.)

  •  Tags:  
  • Related