I am struggling with finding the correct way of achieving the relative return within a month using the last observation in the previous month. Data for reference:
set.seed(123)
Date = seq(as.Date("2021/12/31"), by = "day", length.out = 90)
Returns = runif(90, min=-0.02, max = 0.02)
mData = data.frame(Date, Returns)
Then, I would like to have a return column. For example: When calculating the returns for February the third, then it should be the returns for the respective dates: 2022-02-03 / 2022-01-31 - 1. And likewise for e.g March the third: 2022-03-03 / 2022-02-28 -1. So the question is, how can I keep the date returns within a month as the numerator while having the last observation in the previous month as the denominator using dplyr?
CodePudding user response:
library(tidyverse)
library(lubridate)
set.seed(123)
Date = seq(as.Date("2021/12/31"), by = "day", length.out = 90)
Returns = runif(90, min=-0.02, max = 0.02)
mData = data.frame(Date, Returns)
mData |>
group_by(month(Date)) |>
mutate(last_return = last(Returns)) |>
ungroup() |>
nest(data = c(Date, Returns)) |>
mutate(last_return_lag = lag(last_return)) |>
unnest(data) |>
mutate(x = Returns/last_return_lag)
#> # A tibble: 90 × 6
#> `month(Date)` last_return Date Returns last_return_lag x
#> <dbl> <dbl> <date> <dbl> <dbl> <dbl>
#> 1 12 -0.00850 2021-12-31 -0.00850 NA NA
#> 2 1 0.0161 2022-01-01 0.0115 -0.00850 -1.36
#> 3 1 0.0161 2022-01-02 -0.00364 -0.00850 0.429
#> 4 1 0.0161 2022-01-03 0.0153 -0.00850 -1.80
#> 5 1 0.0161 2022-01-04 0.0176 -0.00850 -2.07
#> 6 1 0.0161 2022-01-05 -0.0182 -0.00850 2.14
#> 7 1 0.0161 2022-01-06 0.00112 -0.00850 -0.132
#> 8 1 0.0161 2022-01-07 0.0157 -0.00850 -1.85
#> 9 1 0.0161 2022-01-08 0.00206 -0.00850 -0.242
#> 10 1 0.0161 2022-01-09 -0.00174 -0.00850 0.204
#> # … with 80 more rows
Created on 2022-02-03 by the reprex package (v2.0.1)
CodePudding user response:
Using a tmp column to get the previous value from the last month (assuming sorted data) and then picking the first. Grouping is done on year-month in group_by.
mData %>%
mutate(tmp=lag(Returns)) %>%
group_by(dat=strftime(Date, format="%Y-%m")) %>%
mutate(tmp=first(tmp), result=Returns/tmp-1) %>%
ungroup() %>%
select(-c(tmp, dat))
# A tibble: 90 × 5 # before select:
Date Returns result # tmp dat
<date> <dbl> <dbl> # <dbl> <chr>
1 2021-12-31 -0.00850 NA # NA 2021-12
2 2022-01-01 0.0115 -2.36 # -0.00850 2022-01
3 2022-01-02 -0.00364 -0.571 # -0.00850 2022-01
4 2022-01-03 0.0153 -2.80 # -0.00850 2022-01
5 2022-01-04 0.0176 -3.07 # -0.00850 2022-01
6 2022-01-05 -0.0182 1.14 # -0.00850 2022-01
7 2022-01-06 0.00112 -1.13 # -0.00850 2022-01
8 2022-01-07 0.0157 -2.85 # -0.00850 2022-01
9 2022-01-08 0.00206 -1.24 # -0.00850 2022-01
10 2022-01-09 -0.00174 -0.796 # -0.00850 2022-01
# … with 80 more rows
