I'm trying to write some code that will check to see if a string contains any words contained in a list of terms, in order to create a new column in the dataframe.
This is the list of terms:
vehicles <- c('vehicle', 'mazda', 'nissan', 'ford', 'honda', 'chevrolet', 'toyota')
Examples of the strings I'm searching include: "2001 honda civic", "2003 nissan altima", "2005 mazda 5", etc. (these are the asset_name in the code below).
my simplified code looks like this:
df %>%
mutate(
asset_type = case_when(
vehicles %in% asset_name == TRUE ~ 'vehicle', # this doesn't work, obviously
<CODE THAT DOES WORK HERE!!!>
TRUE ~ asset_name
)
)
I've tried str_detect, str_extract, grepl & a custom function but can't seem to figure out how to make this work.
I know that for each asset_name entry, I need to loop through the list of vehicles to see if one of the vehicle models is in asset_name but I can't seem to make it work. grr...
Thanks in advance!!!
CodePudding user response:
One approach might be to build a regex alternation of the vehicle terms, and then use grepl to match:
vehicles <- c('vehicle', 'mazda', 'nissan', 'ford', 'honda', 'chevrolet', 'toyota')
regex <- paste0("\\b(?:", paste(vehicles, collapse="|"), ")\\b")
df %>%
mutate(
asset_type = case_when(
grepl(regex, asset_name) ~ 'vehicle',
<CODE THAT DOES WORK HERE!!!>
TRUE ~ asset_name
)
)
CodePudding user response:
Adapted from this answer:
library(tidyverse)
vehicles <- c('vehicle', 'mazda', 'nissan', 'ford', 'honda', 'chevrolet', 'toyota')
asset_name <- c("2001 honda civic", "2003 nissan altima", "2005 mazda 5",
"unmatched1", "unmatched2") # added unmatched strings
x <- 1:length(asset_name) # dummy variable to make df
df <- data.frame(x, asset_name)
df %>%
mutate(asset_type = case_when(
asset_name %in% unlist(lapply(vehicles, grep, asset_name, value = TRUE)) ~ 'vehicle',
TRUE ~ asset_name)
)
Output:
x asset_name asset_type
1 1 2001 honda civic vehicle
2 2 2003 nissan altima vehicle
3 3 2005 mazda 5 vehicle
4 4 unmatched1 unmatched1
5 5 unmatched2 unmatched2
