I want to write a case_when code in a dplyr pipeline. However, I am trying to add multiple cases within it.
For example: If a have the following data frame
| id | purchases |
|---|---|
| a | need |
| a | want |
| a | none |
| b | want |
| b | need |
| c | need |
| c | need |
| c | want |
| d | none |
| d | none |
I want to summarize the output so that case when the first 2 observations per each id are needs and case when the observation "none" is not put in consideration then put yes in a new column. If there's no need or want for a given id then none, else no
The output should be the following: |id|output| |--|---------| |a|no| |b|no| |c|yes| |d|none|
My code
actions %>% group_by (id) %>% arrange(id)
%>% summarise(output = case_when(first(purchases) == "need" & nth(purchases,2) =="need"~ "yes", "no"
I know the code is a bit messy, as I don't know who to add up the second condition of neglecting none observations when the cases would result in a yes or no
CodePudding user response:
I've tried to place your logic in a small function f(), which can then be applied to purchases, by id
f <- function(p) {
if(p[1]==p[2] & (p[1] %in% c("need", "want"))) return("yes")
ifelse(all(p=="none"), "none", "no")
}
df %>% group_by(id) %>% summarize(output=f(purchases))
Output
id output
<chr> <chr>
1 a no
2 b no
3 c yes
4 d none
The function checks if the first and second value of purchases are equal, and if they are either need or want; if so return "yes". Otherwise if all of purchases values are "none", return "none", else return "no".
CodePudding user response:
Try this using case_when
actions %>% group_by(id) %>%
summarise(output =
case_when(isTRUE(intersect(purchases[[1]] , purchases[[2]]) == "none") ~ "none" ,
isTRUE(intersect(purchases[[1]] , purchases[[2]]) %in% c("need" , "want")) ~ "yes",
TRUE ~ "no"))
