i have a data base with dates and numeric variables. I also have multiple rows per id. It looks like this:
| ID | date | x |
|---|---|---|
| 1 | 2019-01-01 | 3 |
| 1 | 2018-12-01 | 4 |
| 1 | 2017-11-01 | 1 |
| 1 | 2017-10-01 | 2 |
| 1 | 2017-09-01 | 2 |
| 1 | 2017-08-01 | 2 |
I need to sum x up to six month ago from date, so i tried this
library(lubridate)
mutate(semester= semester(fecha_inicio,with_year = TRUE)) %>%
group_by(ID,semester) %>%
mutate(sum_semester = sum(x, na.rm = TRUE))
but is not what i need because 2019-01-01 have 3 instead of 14.
Please help.
CodePudding user response:
I found the answer here Cumulative sum from a month ago until the current day for all the rows adapting the code:
library(tidyverse)
library(lubridate)
data <- data %>%
group_by(ID) %>%
mutate(sum_6m = map_dbl(1:n(), ~ sum(x[(date>= (date[.] - months(5))) &
(date<= date[.])], na.rm = TRUE)))
CodePudding user response:
Using the classic way to aggregate datasets x ~ id by a function like sum with a normal data.frame filter it could be the following code.
Code
library(lubridate)
# Define your data
data <-
"id date x
1 2019-01-01 3
1 2018-12-01 4
1 2017-11-01 1
1 2017-10-01 2
1 2018-02-12 2
1 2017-09-01 2
"
# Read the table
tab <- read.csv(text=data, header = T, sep=' ')
# Find the youngest date
top.date <- as.Date(max(tab$date))
# Calculate the threshold (before and after point) of 6 month
thresh <- top.date) %m-% months(6)
# Calculate the sums over the ID's after the point date
after.thresh <- aggregate(x ~ id,
data = tab[as.Date(tab$date) >= thresh,],
FUN = sum)
# Calculate the sums over the ID's before the point date
before.thresh <- aggregate(x ~ id,
data = tab[as.Date(tab$date) < thresh,],
FUN=sum)
# Print the dates
cat("TOP.DATE.IS:", format_ISO8601(top.date),
" THRESH.DATE.IS:", format_ISO8601(thresh),"\n")
# Print the sums
cat("SUM.BEFORE.THRESH:", after.thresh$x,
"SUM.AFTER.THRESH:", before.thresh$x,"\n")
Results
TOP.DATE.IS: 2019-01-01 THRESH.DATE.IS: 2018-07-01
SUM.BEFORE.THRESH: 7 SUM.AFTER.THRESH: 7
