Home > Blockchain >  Dividing a number within a month with the last observation in the previous month using dplyr
Dividing a number within a month with the last observation in the previous month using dplyr

Time:02-04

I am struggling with finding the correct way of achieving the relative return within a month using the last observation in the previous month. Data for reference:

set.seed(123)
Date = seq(as.Date("2021/12/31"), by = "day", length.out = 90)
Returns = runif(90, min=-0.02, max = 0.02)
mData = data.frame(Date, Returns)

Then, I would like to have a return column. For example: When calculating the returns for February the third, then it should be the returns for the respective dates: 2022-02-03 / 2022-01-31 - 1. And likewise for e.g March the third: 2022-03-03 / 2022-02-28 -1. So the question is, how can I keep the date returns within a month as the numerator while having the last observation in the previous month as the denominator using dplyr?

CodePudding user response:

library(tidyverse)
library(lubridate)

set.seed(123)
Date = seq(as.Date("2021/12/31"), by = "day", length.out = 90)
Returns = runif(90, min=-0.02, max = 0.02)
mData = data.frame(Date, Returns)

mData |> 
  group_by(month(Date)) |> 
  mutate(last_return = last(Returns)) |>
  ungroup() |> 
  nest(data = c(Date, Returns)) |> 
  mutate(last_return_lag = lag(last_return)) |> 
  unnest(data) |> 
  mutate(x = Returns/last_return_lag)
#> # A tibble: 90 × 6
#>    `month(Date)` last_return Date        Returns last_return_lag      x
#>            <dbl>       <dbl> <date>        <dbl>           <dbl>  <dbl>
#>  1            12    -0.00850 2021-12-31 -0.00850        NA       NA    
#>  2             1     0.0161  2022-01-01  0.0115         -0.00850 -1.36 
#>  3             1     0.0161  2022-01-02 -0.00364        -0.00850  0.429
#>  4             1     0.0161  2022-01-03  0.0153         -0.00850 -1.80 
#>  5             1     0.0161  2022-01-04  0.0176         -0.00850 -2.07 
#>  6             1     0.0161  2022-01-05 -0.0182         -0.00850  2.14 
#>  7             1     0.0161  2022-01-06  0.00112        -0.00850 -0.132
#>  8             1     0.0161  2022-01-07  0.0157         -0.00850 -1.85 
#>  9             1     0.0161  2022-01-08  0.00206        -0.00850 -0.242
#> 10             1     0.0161  2022-01-09 -0.00174        -0.00850  0.204
#> # … with 80 more rows

Created on 2022-02-03 by the reprex package (v2.0.1)

CodePudding user response:

Using a tmp column to get the previous value from the last month (assuming sorted data) and then picking the first. Grouping is done on year-month in group_by.

mData %>% 
  mutate(tmp=lag(Returns)) %>% 
  group_by(dat=strftime(Date, format="%Y-%m")) %>% 
  mutate(tmp=first(tmp), result=Returns/tmp-1) %>% 
  ungroup() %>% 
  select(-c(tmp, dat))
# A tibble: 90 × 5            # before select:
   Date        Returns result #        tmp dat    
   <date>        <dbl>  <dbl> #    <dbl> <chr>  
 1 2021-12-31 -0.00850 NA     # NA       2021-12
 2 2022-01-01  0.0115  -2.36  # -0.00850 2022-01
 3 2022-01-02 -0.00364 -0.571 # -0.00850 2022-01
 4 2022-01-03  0.0153  -2.80  # -0.00850 2022-01
 5 2022-01-04  0.0176  -3.07  # -0.00850 2022-01
 6 2022-01-05 -0.0182   1.14  # -0.00850 2022-01
 7 2022-01-06  0.00112 -1.13  # -0.00850 2022-01
 8 2022-01-07  0.0157  -2.85  # -0.00850 2022-01
 9 2022-01-08  0.00206 -1.24  # -0.00850 2022-01
10 2022-01-09 -0.00174 -0.796 # -0.00850 2022-01
# … with 80 more rows
  •  Tags:  
  • Related