I have a question on the conditional creation of a new column using dplyr::mutate, using ifelse/if_else
I will use the iris dataset as an example. First I define a general setting
setting1 = TRUE; # general setting
I want to create a new column based on the value of another (here the Sepal.Length) but with a different formula depending of the boolean setting1 value. My first move was (following some example found somewhere, I'm quite a newbie in R):
iris2 <- iris %>% slice_head(n=5) %>%
mutate( NewSL = ifelse(setting1, Sepal.Length*10., Sepal.Length/10.) )
But there is an issue in the value calculated, which appears to be the value of the first line which is propagated
iris2
Sepal.Length Sepal.Width Petal.Length Petal.Width Species NewSL
1 5.1 3.5 1.4 0.2 setosa 51
2 4.9 3.0 1.4 0.2 setosa 51
3 4.7 3.2 1.3 0.2 setosa 51
4 4.6 3.1 1.5 0.2 setosa 51
5 5.0 3.6 1.4 0.2 setosa 51
I understood then it was due to a problem of length, which is clearly identified when I replace ifelse by if_else
iris2 <- iris %>% slice_head(n=5) %>%
mutate( NewSL = if_else(setting1, Sepal.Length*10., Sepal.Length/10.) )
Error: Problem with
mutate()columnNewSL. iNewSL = if_else(setting1, Sepal.Length * 10, Sepal.Length/10). xtruemust be length 1 (length ofcondition), not 5. Runrlang::last_error()to see where the error occurred.
I can do this ugly thing to make it work
rm(iris2)
iris2 <- iris %>% slice_head(n=5) %>%
mutate( NewSL = if_else( rep(setting1, n()), Sepal.Length*10., Sepal.Length/10.) )
and if fact in my real case the false part is a constant so I have to do that twice
iris2 <- iris %>% slice_head(n=5) %>%
mutate( NewSL = if_else( rep(setting1, n()), Sepal.Length*10., rep(10.0, n()) ) )
My question is: Is there an elegant/concise tidy-like way to make this work properly work withoout the rep() trick?
Most examples of the mutate/if_else combination use a column of the dataset in the condition part, not a constant, so no issue with lengths in this case
Of note, I also managed to have the correct output using other approaches
- either using R base syntax
iris2 <- iris
if (setting1) {
iris2$NewSL= iris2$Sepal.Length*10.
} else {
iris2$NewSL = iris2$Sepal.Length/10.
# or iris2$NewSL = 10.0
}
- or using conditional piping to stay in the tidyverse syntax, which seems to work correctly, but I found also quite verbose and less readable for a such simple case
iris2 <- iris %>% slice_head(n=5) %>%
{if(setting1) mutate(., NewSL = Sepal.Length*10.)
else mutate(., NewSL = Sepal.Length/10.) }
I'd like to know the most efficient way to do it properly, using the tidyverse syntax. Thanks in advance for your time.
CodePudding user response:
ifelse/if_else are for vectors. You should continue using if as you have already identified.
library(dplyr)
setting1 = TRUE
iris %>%
slice_head(n=5) %>%
mutate( NewSL = if(setting1) Sepal.Length*10 else Sepal.Length/10)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species NewSL
#1 5.1 3.5 1.4 0.2 setosa 51
#2 4.9 3.0 1.4 0.2 setosa 49
#3 4.7 3.2 1.3 0.2 setosa 47
#4 4.6 3.1 1.5 0.2 setosa 46
#5 5.0 3.6 1.4 0.2 setosa 50
