I have a character string vector that I would like to filter based on keywords from a second vector.
Below is a small reprex:
list1 <- c("I like apples", "I eat bread", "Bananas are my favorite")
fruit <- c("apple","banana")
I am presuming I will be needing to use stringr/stringi, but I would, in essence, like to do something alongs the lines of list1 %in% fruit and it return T,F,T.
Any suggestions?
CodePudding user response:
We can do this with grepl without using external packages.
grepl can handle multiple patterns separated by |, therefore we can first concatenate the strings in fruit together with | as the separator.
Remember to set ignore.case = TRUE if you don't care about case (note the "banana" in your example has different case).
grepl(paste(fruit, collapse = "|"), list1, ignore.case = T)
[1] TRUE FALSE TRUE
Or to subset list1:
list1[grepl(paste(fruit, collapse = "|"), list1, ignore.case = T)]
[1] "I like apples" "Bananas are my favorite"
CodePudding user response:
A solution with str_dectect:
libraray(tidyverse)
data.frame(list1) %>%
mutate(Flag = str_detect(list1, paste0("(?i)", paste0(fruit, collapse = "|"))))
list1 Flag
1 I like apples TRUE
2 I eat bread FALSE
3 Bananas are my favorite TRUE
If you want to filter(i.e. subset) your data:
data.frame(list1) %>%
filter(str_detect(list1, paste0("(?i)", paste0(fruit, collapse = "|"))))
list1
1 I like apples
2 Bananas are my favorite
Note that (?i) is used to make the match case-insensitive.
