Home > database >  grouping to aggregate values, but tripping up on NA's
grouping to aggregate values, but tripping up on NA's

Time:01-20

I have long data, and I am trying to make a new variable (consistent) that is the value for a given column (VALUE), for each person (ID), at TIME = 2. I used the code below to do this, but I am getting tripped up on NA's. If the VALUE for TIME = 2 is NA, then I want it to grab the VALUE at TIME = 1 instead. That part I'm not sure how to do. So, in the example below, I want the new variable (consistent) should be 10 instead of NA.

ID = c("A", "A", "B", "B", "C", "C", "D", "D")
TIME = c(1, 2, 1, 2, 1, 2, 1, 2)
VALUE = c(8, 9, 10, NA, 12, 13, 14, 9)

df = data.frame(ID,  TIME, VALUE)

df <- df %>%
  group_by(ID) %>%
  mutate(consistent = VALUE[TIME == 2]) %>% ungroup  

df

CodePudding user response:

If we want to use the same code, then coalesce with the 'VALUE' where 'TIME' is 1 (assuming there is a single observation of 'TIME' for each 'ID')

library(dplyr)
df %>%
  group_by(ID) %>%
  mutate(consistent = coalesce(VALUE[TIME == 2], VALUE[TIME == 1])) %>% 
  ungroup

-output

# A tibble: 8 × 4
  ID     TIME VALUE consistent
  <chr> <dbl> <dbl>      <dbl>
1 A         1     8          9
2 A         2     9          9
3 B         1    10         10
4 B         2    NA         10
5 C         1    12         13
6 C         2    13         13
7 D         1    14          9
8 D         2     9          9

Or another option is to arrange before doing the group_by and get the first element of 'VALUE' (assuming no replicating for 'TIME')

df %>%
  arrange(ID, is.na(VALUE), desc(TIME)) %>% 
  group_by(ID) %>% 
  mutate(consistent = first(VALUE)) %>%
  ungroup

-output

# A tibble: 8 × 4
  ID     TIME VALUE consistent
  <chr> <dbl> <dbl>      <dbl>
1 A         2     9          9
2 A         1     8          9
3 B         1    10         10
4 B         2    NA         10
5 C         2    13         13
6 C         1    12         13
7 D         2     9          9
8 D         1    14          9

CodePudding user response:

Another possible solution, using tidyr::fill:

library(tidyverse)

df %>%
  group_by(ID) %>%
  mutate(consistent = VALUE) %>% fill(consistent) %>% ungroup  

#> # A tibble: 8 × 4
#>   ID     TIME VALUE consistent
#>   <chr> <dbl> <dbl>      <dbl>
#> 1 A         1     8          8
#> 2 A         2     9          9
#> 3 B         1    10         10
#> 4 B         2    NA         10
#> 5 C         1    12         12
#> 6 C         2    13         13
#> 7 D         1    14         14
#> 8 D         2     9          9

CodePudding user response:

You can also use ifelse with your condition. TIME is guaranteed to be 1 in this scenario if there are only 2 group member each with TIME 1 and 2.

df %>% 
  group_by(ID) %>% 
  arrange(TIME, .by_group=T) %>%
  mutate(consistent=ifelse(is.na(VALUE)&TIME==2, lag(VALUE), VALUE)) %>% 
  ungroup()
# A tibble: 8 × 4
  ID     TIME VALUE consistent
  <chr> <dbl> <dbl>      <dbl>
1 A         1     8          8
2 A         2     9          9
3 B         1    10         10
4 B         2    NA         10
5 C         1    12         12
6 C         2    13         13
7 D         1    14         14
8 D         2     9          9
  •  Tags:  
  • Related