I would like to create a new column, document it only when it matches a specific condition (here x > 2 ) and then directly overwrite another existing column (here auxiliary) for these rows where the condition (x > 2) returned TRUE.
df <- tibble(x = 1:5, y = 1:5, auxiliary = NA)
# A tibble: 5 x 3
x y auxiliary
<int> <dbl> <lgl>
1 1 NA
2 2 NA
3 3 NA
4 4 NA
5 5 NA
I can do this successfully in two different calls within mutate() :
df %>%
mutate(result = if_else(condition = x > 2,
true = x y,
false = NA_real_),
auxiliary = if_else(condition = x > 2,
true = "Calculation done",
false = NA_character_))
# A tibble: 5 x 4
x y auxiliary result
<int> <dbl> <chr> <dbl>
1 1 NA NA
2 2 NA NA
3 3 Calculation done 6
4 4 Calculation done 8
5 5 Calculation done 10
But there's some code repetition (condition = x > 2) which, in more complex cases, makes reading the code very uneasy and prone to errors, especially when there are multiple conditions.
Is there a way to simplify the code above by not repeating the condition ? :
- Create new variable (
mutate()) - Document only if condition is matched (
if_elseorcase_when()) - Write another column's value only if the row's condition is matched. (I'm stuck here)
Something that would look like this :
df %>%
mutate(result = case_when(
x > 2 ~ x y & auxiliary == "Calculation done", # we'd add the column reference here...
TRUE ~ NA_real & auxiliary = NA_character_))
Many thanks ! Any solution from the tidyverse would be ideal.
CodePudding user response:
I would suggest saving the condition which should be used multiple times as string and then using the string as variable in the code, e.g.:
condition <- "x>2"
df %>%
mutate(result = ifelse(eval(parse(text=condition)),
x y,
NA),
auxiliary = ifelse(eval(parse(text=condition)),
"Calculation done",
NA))
Note, that I am using base ifelse statement, to avoid the restriction that I have to use the same type in the column ("dplyr::if_else is specifically written to force you to have the same type in your true and false arguments."). See further information on that e.g. Different behavior of if else statement and if_else.
CodePudding user response:
You can save the result of the condition in a column and use that to avoid evaluating the same condition again and again.
library(dplyr)
df <- tibble(x = 1:5, y = 1:5)
df %>%
mutate(condition = x > 2,
result = if_else(condition,
true = x y,
false = NA_integer_),
auxiliary = if_else(condition,
true = "Calculation done",
false = NA_character_))
# x y condition result auxiliary
# <int> <int> <lgl> <int> <chr>
#1 1 1 FALSE NA NA
#2 2 2 FALSE NA NA
#3 3 3 TRUE 6 Calculation done
#4 4 4 TRUE 8 Calculation done
#5 5 5 TRUE 10 Calculation done
