Home > Back-end >  Carry last observation forward (na.locf?) - but only when another column value changes
Carry last observation forward (na.locf?) - but only when another column value changes

Time:01-25

Background: I watch a portfolio consiting of weights in different ISIN codes. On a monthly basis I can observe how the weights in each ISIN code changes. The changes in the weights is always the last observation of the month.

Goal: My goal is to span the monthly weights out on a daily basis. Thus, the observation I have in the end of a month must be carried forward and span out on all the daily observations in the next month - until a new weight is observed.

A little problem i have incurred: The biggest problem with this, is that I cannot just use "na.locf" - because sometimes a ISIN will be sold out of the portfolio, hence the weight becomes 0 instead. So, in order to overcome I have tried to create a "helping column" in order to achieve my goal. This column only tracks the year and month we are in. This helper column is called "ym" in the code below.

A reproducible code:

df = data.frame(
  Date = as.POSIXlt(c("29-01-2021"  , "01-02-2021", "02-02-2021", "03-02-2021", "04-02-2021", "05-02-2021",
           "08-02-2021", "09-02-2021", "10-02-2021" , "11-02-2021", "12-02-2021" , "15-02-2021",
           "16-02-2021",    "17-02-2021",   "18-02-2021", "19-02-2021", "22-02-2021", "23-02-2021", 
           "24-02-2021", "25-02-2021", "26-02-2021" , "01-03-2021", "02-03-2021", "03-03-2021",
           "04-03-2021",    "05-03-2021",   "08-03-2021" , "09-03-2021", "10-03-2021", "11-03-2021",    "12-03-2021"), format = "%d-%m-%Y"),
  DK0004612884 = c(14.48, NA, NA, NA, NA,NA, NA,NA, NA,NA, NA,NA, NA,NA,
                   NA,NA, NA,NA, NA,NA, 14.15,NA, NA,NA, NA, NA, NA, NA, NA, NA, NA),
  DK0002025170 = c(0.58, NA, NA, NA, NA,NA, NA,NA, NA,NA, NA,NA, NA,NA,
                   NA,NA, NA,NA, NA,NA, NA ,NA, NA,NA, NA, NA, NA, NA, NA, NA, NA)
  
)
df$ym = paste0(year(df$Date), "-", month(df$Date))
print.data.frame(df)
        Date DK0004612884 DK0002025170     ym
1  2021-01-29        14.48         0.58 2021-1
2  2021-02-01           NA           NA 2021-2
3  2021-02-02           NA           NA 2021-2
4  2021-02-03           NA           NA 2021-2
5  2021-02-04           NA           NA 2021-2
6  2021-02-05           NA           NA 2021-2
7  2021-02-08           NA           NA 2021-2
8  2021-02-09           NA           NA 2021-2
9  2021-02-10           NA           NA 2021-2
10 2021-02-11           NA           NA 2021-2
11 2021-02-12           NA           NA 2021-2
12 2021-02-15           NA           NA 2021-2
13 2021-02-16           NA           NA 2021-2
14 2021-02-17           NA           NA 2021-2
15 2021-02-18           NA           NA 2021-2
16 2021-02-19           NA           NA 2021-2
17 2021-02-22           NA           NA 2021-2
18 2021-02-23           NA           NA 2021-2
19 2021-02-24           NA           NA 2021-2
20 2021-02-25           NA           NA 2021-2
21 2021-02-26        14.15           NA 2021-2
22 2021-03-01           NA           NA 2021-3
23 2021-03-02           NA           NA 2021-3
24 2021-03-03           NA           NA 2021-3
25 2021-03-04           NA           NA 2021-3
26 2021-03-05           NA           NA 2021-3
27 2021-03-08           NA           NA 2021-3
28 2021-03-09           NA           NA 2021-3
29 2021-03-10           NA           NA 2021-3
30 2021-03-11           NA           NA 2021-3
31 2021-03-12           NA           NA 2021-3

See the following: From date 2021-01-29 until 2021-02-26 the observations of 14.48 and 0.58 must be carried forward until date: "2021-02-26 ". So here the weight for both ISIN changes. But, see that I sell out the DK0002025170 ISIN. Thus, every value from here (until a potential new weight is seen) must be 0. While the new weight of 14.15 belonging to DK0004612884 must be carried forward until a new weight is seen.

CodePudding user response:

You can use fill:

df %>% 
  mutate(sum = rowSums(!is.na(select(., DK0004612884:DK0002025170))),
         across(DK0004612884:DK0002025170, ~ case_when(sum > 0 ~ replace_na(.x, 0)))) %>% 
  fill(DK0004612884:DK0002025170)

output

         Date DK0004612884 DK0002025170     ym sum
1  2021-01-29        14.48         0.58 2021-1   2
2  2021-02-01        14.48         0.58 2021-2   0
3  2021-02-02        14.48         0.58 2021-2   0
4  2021-02-03        14.48         0.58 2021-2   0
5  2021-02-04        14.48         0.58 2021-2   0
6  2021-02-05        14.48         0.58 2021-2   0
7  2021-02-08        14.48         0.58 2021-2   0
8  2021-02-09        14.48         0.58 2021-2   0
9  2021-02-10        14.48         0.58 2021-2   0
10 2021-02-11        14.48         0.58 2021-2   0
11 2021-02-12        14.48         0.58 2021-2   0
12 2021-02-15        14.48         0.58 2021-2   0
13 2021-02-16        14.48         0.58 2021-2   0
14 2021-02-17        14.48         0.58 2021-2   0
15 2021-02-18        14.48         0.58 2021-2   0
16 2021-02-19        14.48         0.58 2021-2   0
17 2021-02-22        14.48         0.58 2021-2   0
18 2021-02-23        14.48         0.58 2021-2   0
19 2021-02-24        14.48         0.58 2021-2   0
20 2021-02-25        14.48         0.58 2021-2   0
21 2021-02-26        14.15         0.00 2021-2   1
22 2021-03-01        14.15         0.00 2021-3   0
23 2021-03-02        14.15         0.00 2021-3   0
24 2021-03-03        14.15         0.00 2021-3   0
25 2021-03-04        14.15         0.00 2021-3   0
26 2021-03-05        14.15         0.00 2021-3   0
27 2021-03-08        14.15         0.00 2021-3   0
28 2021-03-09        14.15         0.00 2021-3   0
29 2021-03-10        14.15         0.00 2021-3   0
30 2021-03-11        14.15         0.00 2021-3   0
31 2021-03-12        14.15         0.00 2021-3   0
  •  Tags:  
  • Related