Suppose we start with the below data frame:
ID <- c(1, 1, 1, 5, 5)
Period <- c(1,2,3,1,2)
Value <- c(10,12,11,4,6)
df <- data.frame(ID, Period, Value)
ID Period Value
1 1 1 10
2 1 2 12
3 1 3 11
4 5 1 4
5 5 2 6
The below dplyr code adds a "Calculate" column, multiplying the Period and Value columns, with the exception that the last row of a grouped ID is assigned a value of 0 if that row's corresponding Value is > 10:
df %>%
mutate(Calculate = Period * Value) %>%
group_by(ID) %>%
mutate(Calculate = case_when(Value > 10 ~ replace(Calculate, n(), 0), TRUE ~ Period * Value)) %>%
ungroup
ID Period Value Calculate
<dbl> <dbl> <dbl> <dbl>
1 1 1 10 10
2 1 2 12 24
3 1 3 11 0
4 5 1 4 4
5 5 2 6 12
The problem with the above is the TRUE ~ Period * Value portion of the case_when() function. Although it works fine in this simple example, in the actual code this is intended for, the calculations for the TRUE ~... are more complicated than Period * Value. They can not be repeated in the "then" portion of the conditional. In the second mutate() above, I need to preserve the original Calculate value and only replace if both Value > 10 AND we are in the last row of a grouped ID n(). It's as if we need a one-side conditional, and not the 2-side conditionals of case_when() and if_else(), etc. I've searched for one-sided conditionals in dplyr and R with no luck yet.
Any ideas for how to do this?
CodePudding user response:
A possible solution, where Calculate is determined in the first mutate (therefore, outside if_else), which can correspond to a very complicated calculation, as you declare you are needing:
library(tidyverse)
ID <- c(1, 1, 1, 5, 5)
Period <- c(1,2,3,1,2)
Value <- c(10,12,11,4,6)
df <- data.frame(ID, Period, Value)
df %>%
mutate(Calculate = Period * Value) %>%
group_by(ID) %>%
mutate(Calculate = if_else(row_number() == n() & Value > 10, 0, Calculate)) %>%
ungroup
#> # A tibble: 5 × 4
#> ID Period Value Calculate
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 10 10
#> 2 1 2 12 24
#> 3 1 3 11 0
#> 4 5 1 4 4
#> 5 5 2 6 12
CodePudding user response:
If we need to convert values to 0, can multiply with a logical vector so that FALSE -> 0 will return 0 and other values (TRUE -> 1) return the original vector (assuming it is numeric)
library(dplyr)
df %>%
mutate(Calculate = Period * Value) %>%
group_by(ID) %>%
mutate(Calculate = Calculate * !(row_number() == n() & Value > 10)) %>%
ungroup
-output
# A tibble: 5 × 4
ID Period Value Calculate
<dbl> <dbl> <dbl> <dbl>
1 1 1 10 10
2 1 2 12 24
3 1 3 11 0
4 5 1 4 4
5 5 2 6 12
