My objective is to do a cumulative sum of the elements of a vector and assign the result to each element. But when certain condition is reached, then reset the cumulative sum.
For example:
vector_A <- c(1, 1, -1, -1, -1, 1, -1, -1, 1, -1)
Now, suppose that the condition to reset the cumulative sum is that the next element has a different sign.
Then the desired output is:
vector_B <- c(1, 2, -1, -2, -3, 1, -1, -2, 1, -1)
How can I achieve this?
CodePudding user response:
A base R option with Reduce
> Reduce(function(x, y) ifelse(x * y > 0, x y, y), vector_A, accumulate = TRUE)
[1] 1 2 -1 -2 -3 1 -1 -2 1 -1
or using ave cumsum
> ave(vector_A, cumsum(c(1, diff(sign(vector_A)) != 0)), FUN = cumsum)
[1] 1 2 -1 -2 -3 1 -1 -2 1 -1
CodePudding user response:
Using ave:
ave(vector_A, data.table::rleid(sign(A)), FUN = cumsum)
# [1] 1 2 -1 -2 -3 1 -1 -2 1 -1
A formula version of accumulate:
purrr::accumulate(vector_A, ~ ifelse(sign(.x) == sign(.y), .x .y, .y))
# [1] 1 2 -1 -2 -3 1 -1 -2 1 -1
CodePudding user response:
You can use a custom function instead of cumsum and accumulate results using e.g. purrr::accumulate:
library(purrr)
vector_A <- c(1, 1, -1, -1, -1, 1, -1, -1, 1, -1)
purrr::accumulate(vector_A, function(a,b) {
if (sign(a) == sign(b))
a b
else
b
})
[1] 1 2 -1 -2 -3 1 -1 -2 1 -1
or if you want to avoid any branch:
purrr::accumulate(vector_A, function(a,b) { b a*(sign(a) == sign(b))})
[1] 1 2 -1 -2 -3 1 -1 -2 1 -1
CodePudding user response:
The approach that comes to mind is to find the runs (rle()) defined by the
condition (sign()) in the data, apply cumsum() on each run separately
(tapply()), and the concatenate back into a vector (unlist()). Something
like this:
vector_A <- c(1, 1, -1, -1, -1, 1, -1, -1, 1, -1)
run_length <- rle(sign(vector_A))$lengths
run_id <- rep(seq_along(run_length), run_length)
unlist(tapply(vector_A, run_id, cumsum), use.names = FALSE)
#> [1] 1 2 -1 -2 -3 1 -1 -2 1 -1
Wrapping the process up a bit, I’d maybe put finding the grouping factor (run
index) in a function? And then the grouped summary will need to be done using
existing tools, like tapply() above, or a creative ave(), or in the
context of data frames, a group_by() and summarise() with dplyr.
run_index <- function(x) {
with(rle(x), rep(seq_along(lengths), lengths))
}
ave(vector_A, run_index(sign(vector_A)), FUN = cumsum)
#> [1] 1 2 -1 -2 -3 1 -1 -2 1 -1
