Home > Back-end >  Conditional cumulative sum from two columns
Conditional cumulative sum from two columns

Time:01-15

I can't get my head around the following problem.

Assuming the follwoing data:

library(tidyverse)
df <- tibble(source = c("A", "A", "B", "B", "B", "C"),
             value  = c(5, 10, NA, NA, NA, 20),
             add    = c(1, 1, 1, 2, 3, 4))

What I want to do is: for all cases where source == "B", I want to calculate the cumulative sum of the previous row's value and the current row's add. Of course, for the first "B" row, I need to provide a starting value for value. Note: in this case, it would be fine if we just take the value from the last "A" row.

So for row 3, the result would be 10 1 = 11.
For row 4, the result would be 11 2 = 13.
For row 5, the results would be 13 3 = 16.

I tried to use purrr::accumulate, but I failed in many different ways, e.g. I thought I can do:

df %>%
  mutate(test = accumulate(add, .init = 10, ~.x   .y))

But this leads to error:

Error: Problem with `mutate()` column `test`.
i `test = accumulate(add, .init = 10, ~.x   .y)`.
i `test` must be size 6 or 1, not 7.

Same if I use .init = value

And I also didn't manage to do the job only on group B (although this is probably no issue, I think I can probably performa on the full data frame and then just replace values for all non-B rows).

Expected output:

# A tibble: 6 x 4
  source value   add  test
  <chr>  <dbl> <dbl> <dbl>
1 A          5     1    NA
2 A         10     1    NA
3 B         NA     1    11
4 B         NA     2    13
5 B         NA     3    16
6 C         20     4    NA

CodePudding user response:

You were essentially in the right direction. Since you provide an .init value to accumulate, the resulting vector is of size n 1, with the first value being .init. You have to remove the first value to get a vector that fit to your column size.

Then, if you want NAs on the remaining values, here's a way to do it. Also, since the "starting row" is the third, .init has to be set to 8.

df %>%
  mutate(test = 
           ifelse(source == "B", accumulate(add, .init = 8, ~.x   .y)[-1], NA))

# A tibble: 6 x 4
  source value   add  test
  <chr>  <dbl> <dbl> <dbl>
1 A          5     1    NA
2 A         10     1    NA
3 B         NA     1    11
4 B         NA     2    13
5 B         NA     3    16
6 C         20     4    NA

CodePudding user response:

@tmfmnk provided an awesome answer and they deserve full credit (NOT ME)

Below is the same code from their comment (for more visibility, while also setting an initial value)

init_value = 10
df = df %>%     
  mutate(test = lag(value)) %>%  
  group_by(source) %>%    
  mutate(test = init_value   cumsum(add))
  •  Tags:  
  • Related