I have two lists:
n <- 'winner'
p_list <- c('qualified', 'female', 'apple')
df_features <- c('female','qualified','admission','apple_B','apple_C','apple_D')
I want to generate a formula like so given p_list and df_features:
winner ~ apple_B apple_C apple_D female qualified
Basically I am given p_list and n. I want to create a formula with n being the outcome and p_list being the regressors. However if one of the elements in p_list is not in df_features, I want to alter that element to be replaced by anything with the same text before the underscore (_) from df_features. So apple would be replaced by apple_B apple_C apple_D. Hopefully this makes sense.
How can I do this in R (I prefer a solution if dplyr if possible).
I've tried this so far:
f <- as.formula(paste(n,"~",paste(p_list,collapse=" ")))
But right now the solution is not accounting for df_features and the altering of the variable apple.
I'm also able to check if values in p_list are in df_features by p_list %in% df_features, but not sure how to use it right now.
CodePudding user response:
grep out from the df_features those matching p_list and use with reformulate to produce the formula. No packages are used.
reformulate(unlist(sapply(p_list, grep, df_features, value = TRUE)), n)
## winner ~ qualified female apple_B apple_C apple_D
CodePudding user response:
The Answer by G. Grothendieck is so good, I almost feel shame of posting mine. However, I'll do, as I find that sometimes going the long way gives you additional knowledge of the tool at hand:
as.formula(paste0(n,
" ~ ",
paste(c(p_list[p_list %in% df_features == TRUE],
grep(p_list[p_list %in% df_features == FALSE],
df_features,
value=TRUE)),
collapse = " ")))
What is in there:
as.formulaconverts strings to formula.paste0will paste the string stored inn, the tilde and the result ofpaste.pastewill concatenate, using " " as collapser (collapse = " "):- those elements of
p_listthat are indf_features(henceTRUE) - and it will grep on
df_featuresthose that are not a direct match (FALSE), returning the values and not the indexes (value = TRUE).
