Given a panel data set as follow:
df <- structure(list(date = c(1999L, 2000L, 2001L, 1999L, 2000L, 2001L,
1999L, 2000L, 2001L), firms = c("A", "A", "A", "B", "B", "B",
"C", "C", "C"), return = c(5L, NA, 6L, 9L, NA, 10L, 8L, NA, 3L
)), class = "data.frame", row.names = c(NA, -9L))
I'm able to groupby firms and back fill return using df %>% group_by(firms) %>% fill(return, .direction="up"):
date firms return
<int> <chr> <int>
1 1999 A 5
2 2000 A 6
3 2001 A 6
4 1999 B 9
5 2000 B 10
6 2001 B 10
7 1999 C 8
8 2000 C 3
9 2001 C 3
But I wonder how could back fill certain groups ie., B and C?
The expected outcome:
date firms return
1 1999 A 5
2 2000 A NA
3 2001 A 6
4 1999 B 9
5 2000 B 10
6 2001 B 10
7 1999 C 8
8 2000 C 3
9 2001 C 3
CodePudding user response:
One option would be to create a second column, duplicating only the groups that you wanted to fill. Then, I use coalesce to combine the two columns together.
library(tidyverse)
df %>%
mutate(return2 = ifelse(firms %in% c("B", "C"), return, NA)) %>%
group_by(firms) %>%
fill(return2, .direction="up") %>%
mutate(return = coalesce(return, return2)) %>%
select(-return2)
Another option is to create a new dataframe with the groups that you want to fill, then join the data back to the original dataframe. Then, I apply coalesce to the two columns that start with "return".
df %>%
filter(firms != "A") %>%
group_by(firms) %>%
fill(return, .direction="up") %>%
left_join(df, ., by = c("date", "firms")) %>%
mutate(return = coalesce(!!!select(., starts_with("return")))) %>%
select(-c(return.x, return.y))
Another option is to split the dataframe by groups into a list of tibbles. Then, I select the groups to fill, then bind back together.
df %>%
group_split(firms, .keep = TRUE) %>%
map_at(c(2:3), fill, return, .direction="up") %>%
map_dfr(., bind_rows)
Output
date firms return
<int> <chr> <int>
1 1999 A 5
2 2000 A NA
3 2001 A 6
4 1999 B 9
5 2000 B 10
6 2001 B 10
7 1999 C 8
8 2000 C 3
9 2001 C 3
CodePudding user response:
This is a simple and intuitive solution:
library(data.table)
df %>% group_by(firms) %>%
mutate(return = ifelse(firms %in% c("B", "C"), nafill(return, type ="locf"), return))
>
date firms return
1 1999 A 5
2 2000 A NA
3 2001 A 6
4 1999 B 9
5 2000 B 9
6 2001 B 10
7 1999 C 8
8 2000 C 8
9 2001 C 3
>
CodePudding user response:
You can subset observations and then mutate columns
library(data.table)
df <- data.table(df)
df[firms %in% c("B", "C"), return := nafill(return, type = "nocb"), by = firms]
