Home > Mobile >  How to groupby and back-fill only certain groups
How to groupby and back-fill only certain groups

Time:01-18

Given a panel data set as follow:

df <- structure(list(date = c(1999L, 2000L, 2001L, 1999L, 2000L, 2001L, 
1999L, 2000L, 2001L), firms = c("A", "A", "A", "B", "B", "B", 
"C", "C", "C"), return = c(5L, NA, 6L, 9L, NA, 10L, 8L, NA, 3L
)), class = "data.frame", row.names = c(NA, -9L))

I'm able to groupby firms and back fill return using df %>% group_by(firms) %>% fill(return, .direction="up"):

   date firms return
  <int> <chr>  <int>
1  1999 A          5
2  2000 A          6
3  2001 A          6
4  1999 B          9
5  2000 B         10
6  2001 B         10
7  1999 C          8
8  2000 C          3
9  2001 C          3

But I wonder how could back fill certain groups ie., B and C?

The expected outcome:

  date firms return
1 1999     A      5
2 2000     A     NA
3 2001     A      6
4 1999     B      9
5 2000     B     10
6 2001     B     10
7 1999     C      8
8 2000     C      3
9 2001     C      3

CodePudding user response:

One option would be to create a second column, duplicating only the groups that you wanted to fill. Then, I use coalesce to combine the two columns together.

library(tidyverse)

df %>% 
  mutate(return2 = ifelse(firms %in% c("B", "C"), return, NA)) %>% 
  group_by(firms) %>% 
  fill(return2, .direction="up") %>% 
  mutate(return = coalesce(return, return2)) %>% 
  select(-return2)

Another option is to create a new dataframe with the groups that you want to fill, then join the data back to the original dataframe. Then, I apply coalesce to the two columns that start with "return".

df %>% 
  filter(firms != "A") %>% 
  group_by(firms) %>% 
  fill(return, .direction="up") %>% 
  left_join(df, ., by = c("date", "firms")) %>% 
  mutate(return = coalesce(!!!select(., starts_with("return")))) %>% 
  select(-c(return.x, return.y))

Another option is to split the dataframe by groups into a list of tibbles. Then, I select the groups to fill, then bind back together.

df %>%
    group_split(firms, .keep = TRUE) %>% 
    map_at(c(2:3), fill, return, .direction="up") %>% 
    map_dfr(., bind_rows)

Output

   date firms return
  <int> <chr>  <int>
1  1999 A          5
2  2000 A         NA
3  2001 A          6
4  1999 B          9
5  2000 B         10
6  2001 B         10
7  1999 C          8
8  2000 C          3
9  2001 C          3

CodePudding user response:

This is a simple and intuitive solution:

library(data.table)
df %>% group_by(firms) %>% 
mutate(return = ifelse(firms %in% c("B", "C"), nafill(return, type ="locf"), return))

>
  date firms return
1 1999     A      5
2 2000     A     NA
3 2001     A      6
4 1999     B      9
5 2000     B      9
6 2001     B     10
7 1999     C      8
8 2000     C      8
9 2001     C      3
> 

CodePudding user response:

You can subset observations and then mutate columns

library(data.table)
df <- data.table(df)
df[firms %in% c("B", "C"), return := nafill(return, type = "nocb"), by = firms]
  •  Tags:  
  • Related