Home > Enterprise >  How to perform a multi-conditional replace in dplyr?
How to perform a multi-conditional replace in dplyr?

Time:01-10

Suppose we start with the below data frame:

ID <- c(1, 1, 1, 5, 5)
Period <- c(1,2,3,1,2)
Value <- c(10,12,11,4,6)
df <- data.frame(ID, Period, Value)

  ID Period Value
1  1      1    10
2  1      2    12
3  1      3    11
4  5      1     4
5  5      2     6

The below dplyr code adds a "Calculate" column, multiplying the Period and Value columns, with the exception that the last row of a grouped ID is assigned a value of 0 if that row's corresponding Value is > 10:

df %>% 
    mutate(Calculate = Period * Value) %>%
    group_by(ID) %>%
    mutate(Calculate = case_when(Value > 10 ~ replace(Calculate, n(), 0), TRUE ~ Period * Value)) %>%
    ungroup

     ID Period Value Calculate
  <dbl>  <dbl> <dbl>     <dbl>
1     1      1    10        10
2     1      2    12        24
3     1      3    11         0
4     5      1     4         4
5     5      2     6        12

The problem with the above is the TRUE ~ Period * Value portion of the case_when() function. Although it works fine in this simple example, in the actual code this is intended for, the calculations for the TRUE ~... are more complicated than Period * Value. They can not be repeated in the "then" portion of the conditional. In the second mutate() above, I need to preserve the original Calculate value and only replace if both Value > 10 AND we are in the last row of a grouped ID n(). It's as if we need a one-side conditional, and not the 2-side conditionals of case_when() and if_else(), etc. I've searched for one-sided conditionals in dplyr and R with no luck yet.

Any ideas for how to do this?

CodePudding user response:

A possible solution, where Calculate is determined in the first mutate (therefore, outside if_else), which can correspond to a very complicated calculation, as you declare you are needing:

library(tidyverse)

ID <- c(1, 1, 1, 5, 5)
Period <- c(1,2,3,1,2)
Value <- c(10,12,11,4,6)
df <- data.frame(ID, Period, Value)

df %>% 
  mutate(Calculate = Period * Value) %>% 
  group_by(ID) %>% 
  mutate(Calculate = if_else(row_number() == n() & Value > 10, 0, Calculate)) %>% 
  ungroup

#> # A tibble: 5 × 4
#>      ID Period Value Calculate
#>   <dbl>  <dbl> <dbl>     <dbl>
#> 1     1      1    10        10
#> 2     1      2    12        24
#> 3     1      3    11         0
#> 4     5      1     4         4
#> 5     5      2     6        12

CodePudding user response:

If we need to convert values to 0, can multiply with a logical vector so that FALSE -> 0 will return 0 and other values (TRUE -> 1) return the original vector (assuming it is numeric)

library(dplyr)
df %>% 
  mutate(Calculate = Period * Value) %>% 
  group_by(ID) %>% 
  mutate(Calculate = Calculate * !(row_number() == n() & Value > 10)) %>% 
  ungroup

-output

# A tibble: 5 × 4
     ID Period Value Calculate
  <dbl>  <dbl> <dbl>     <dbl>
1     1      1    10        10
2     1      2    12        24
3     1      3    11         0
4     5      1     4         4
5     5      2     6        12
  •  Tags:  
  • Related