Can you combine case_when and startsWith in R for complex groupings-CodePudding

I am trying to group some complex categories together which have similar starts to the strings.

Here is an example of the first case_when clause which you can see is very long (I have edited the strings for brevity)

Is there a way to write a case_when statement that will group all values that start with 'Conditions'? (This will also apply to the rest of the clauses that start with eg 'Mental health etc etc'.)

Thank you all!

mutate(condition=case_when(health_conditions == 'Conditions ABC' | health_conditions == 'Conditions DEF' | health_conditions =='HIJ' | health_conditions == 'Conditions KLM, Parkinsons)' | health_conditions == 'Conditions NOP' ~ 'Conditions')

CodePudding user response：

We may use a regex with grepl/str_detect to combine those cases

library(dplyr)
library(stringr)
df1 %>%
   mutate(condition = case_when(str_detect(health_conditions, 
      "^Conditions")|health_conditions == "HIJ" ~ 'Conditions'))

Or another option is startsWith

df1 %>%
   mutate(condition = case_when(startsWith(health_conditions, 
      "Conditions")|health_conditions == "HIJ" ~ "Conditions"))

CodePudding user response：

case_when() is a bit of a code smell. The following base R idiom is simpler and uses %in%:

conds <- c("Conditions DEF",  "HIJ") # add extra as required
df1$condition[df1$health_conditions %in% conds] <- "Condition"

Or, as suggested in the other answer, a regex might help:

df1$condition[grepl("Conditions", df1$health_conditions)] <- "Condition"