Consider the vectors
group = rep(1:6, each = 2)
x = 1:12
Now, I wanna compute a cumulative sum by group if any of the members of the group meets a condition. The condition is, e.g, that x %% 3 == 0.
## Without the cumulative sum
ave(x, group, FUN = function(x) any(x %% 3 == 0))
# [1] 0 0 1 1 1 1 0 0 1 1 1 1
## With the cumulative sum
ave(x, group, FUN = function(x) cumsum(any(x %% 3 == 0)))
# [1] 0 0 1 1 1 1 0 0 1 1 1 1
##Expected result with cumsum:
# [1] 0 0 1 2 1 2 0 0 1 2 1 2
This also arises in dplyr:
dWithoutCumsum <- data.frame(group, x) %>%
group_by(group) %>%
mutate(z = any(x %% 3 == 0))
dWithCumsum <- data.frame(group, x) %>%
group_by(group) %>%
mutate(z = cumsum(any(x %% 3 == 0)))
all.equal(dWithCumsum,dWithoutCumsum)
# [1] TRUE
Moreover, when the cumsum function is set afterwards, everything's alright:
ave(ave(x, group, FUN = function(x) any(x %% 3 == 0)), group, FUN = cumsum)
# [1] 0 0 1 2 1 2 0 0 1 2 1 2
data.frame(group, x) %>%
group_by(group) %>%
mutate(z = any(x %% 3 == 0),
z = cumsum(z)) %>%
pull(z)
# [1] 0 0 1 2 1 2 0 0 1 2 1 2
Why is it the case that the cumsum function does not work as expected in those cases (does not work with all instead of any as well), and is it possible to get the expected result in one line?
CodePudding user response:
My understanding is that you want to return an increasing sequence if you detect at least one multiple of 3 and a zero vector otherwise. In that case:
g <- gl(6, 2)
g
## [1] 1 1 2 2 3 3 4 4 5 5 6 6
## Levels: 1 2 3 4 5 6
x <- seq_along(g)
x
## [1] 1 2 3 4 5 6 7 8 9 10 11 12
f <- function(x) if (any(x %% 3 == 0)) seq_along(x) else integer(length(x))
unsplit(tapply(x, g, f, simplify = FALSE), g)
## [1] 0 0 1 2 1 2 0 0 1 2 1 2
Or, within a data frame, with dplyr:
library("dplyr")
d <- data.frame(g, x)
d %>% group_by(g) %>% mutate(y = f(x))
# A tibble: 12 × 3
# Groups: g [6]
g x y
<fct> <int> <int>
1 1 1 0
2 1 2 0
3 2 3 1
4 2 4 2
5 3 5 1
6 3 6 2
7 4 7 0
8 4 8 0
9 5 9 1
10 5 10 2
11 6 11 1
12 6 12 2
CodePudding user response:
You're not actually doing a cumsum--nothing needs to be summed. You are looking for the row number within the group.
Here are a couple ways with dplyr:
df %>%
group_by(group) %>%
mutate(
result1 = row_number() * any(y %% 3 == 0),
result2 = case_when(
any(y %% 3 == 0) ~ row_number(),
TRUE ~ 0L
)
)
# # A tibble: 12 × 4
# # Groups: group [6]
# group y result1 result2
# <int> <int> <int> <int>
# 1 1 1 0 0
# 2 1 2 0 0
# 3 2 3 1 1
# 4 2 4 2 2
# 5 3 5 1 1
# 6 3 6 2 2
# 7 4 7 0 0
# 8 4 8 0 0
# 9 5 9 1 1
# 10 5 10 2 2
# 11 6 11 1 1
# 12 6 12 2 2
