Home > Enterprise >  R - Looking for a better syntax in a dplyr::mutate/if_else combination
R - Looking for a better syntax in a dplyr::mutate/if_else combination

Time:01-30

I have a question on the conditional creation of a new column using dplyr::mutate, using ifelse/if_else

I will use the iris dataset as an example. First I define a general setting

setting1 = TRUE; # general setting

I want to create a new column based on the value of another (here the Sepal.Length) but with a different formula depending of the boolean setting1 value. My first move was (following some example found somewhere, I'm quite a newbie in R):

iris2 <- iris %>% slice_head(n=5) %>%
  mutate( NewSL = ifelse(setting1, Sepal.Length*10., Sepal.Length/10.) )

But there is an issue in the value calculated, which appears to be the value of the first line which is propagated

iris2
 Sepal.Length Sepal.Width Petal.Length Petal.Width Species NewSL
 1          5.1         3.5          1.4         0.2  setosa    51
 2          4.9         3.0          1.4         0.2  setosa    51
 3          4.7         3.2          1.3         0.2  setosa    51
 4          4.6         3.1          1.5         0.2  setosa    51
 5          5.0         3.6          1.4         0.2  setosa    51

I understood then it was due to a problem of length, which is clearly identified when I replace ifelse by if_else

iris2 <- iris %>% slice_head(n=5) %>%
  mutate( NewSL = if_else(setting1, Sepal.Length*10., Sepal.Length/10.) )

Error: Problem with mutate() column NewSL. i NewSL = if_else(setting1, Sepal.Length * 10, Sepal.Length/10). x true must be length 1 (length of condition), not 5. Run rlang::last_error() to see where the error occurred.

I can do this ugly thing to make it work

rm(iris2)
iris2 <- iris %>% slice_head(n=5) %>%
  mutate( NewSL = if_else( rep(setting1, n()), Sepal.Length*10., Sepal.Length/10.) )

and if fact in my real case the false part is a constant so I have to do that twice

iris2 <- iris %>% slice_head(n=5) %>%
  mutate( NewSL = if_else( rep(setting1, n()), Sepal.Length*10., rep(10.0, n()) ) )

My question is: Is there an elegant/concise tidy-like way to make this work properly work withoout the rep() trick?

Most examples of the mutate/if_else combination use a column of the dataset in the condition part, not a constant, so no issue with lengths in this case

Of note, I also managed to have the correct output using other approaches

  1. either using R base syntax
iris2 <- iris
if (setting1) {
  iris2$NewSL=  iris2$Sepal.Length*10.
} else {
  iris2$NewSL = iris2$Sepal.Length/10.
  # or iris2$NewSL = 10.0
}
  1. or using conditional piping to stay in the tidyverse syntax, which seems to work correctly, but I found also quite verbose and less readable for a such simple case
iris2 <- iris %>% slice_head(n=5) %>%
  {if(setting1) mutate(., NewSL = Sepal.Length*10.) 
    else mutate(., NewSL = Sepal.Length/10.) }

I'd like to know the most efficient way to do it properly, using the tidyverse syntax. Thanks in advance for your time.

CodePudding user response:

ifelse/if_else are for vectors. You should continue using if as you have already identified.

library(dplyr)

setting1 = TRUE

iris %>% 
  slice_head(n=5) %>%
  mutate( NewSL = if(setting1) Sepal.Length*10 else Sepal.Length/10)

#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species NewSL
#1          5.1         3.5          1.4         0.2  setosa    51
#2          4.9         3.0          1.4         0.2  setosa    49
#3          4.7         3.2          1.3         0.2  setosa    47
#4          4.6         3.1          1.5         0.2  setosa    46
#5          5.0         3.6          1.4         0.2  setosa    50
  •  Tags:  
  • Related