Home > Back-end >  How to copy a character string in a dataframe row to all subsequent rows sharing the same ID?
How to copy a character string in a dataframe row to all subsequent rows sharing the same ID?

Time:01-09

Suppose we start with the data frame df shown below:

  ID Flag
1  1 NULL
2  1 NULL
3  1  FRY
4  1  CRY
5  1 NULL
6  5  CRY
7  5 NULL
8  5 NULL

ID <- c(1, 1, 1, 1,1, 5, 5, 5)
  Flag <- c("NULL","NULL","FRY","CRY","NULL","CRY","NULL","NULL")
  df <- data.frame(ID, Flag)
  df

I would like to change the "Flag" column so that the first time a Flag row is not "NULL" for a given ID, then that non-NULL item is copied down to all remaining rows for that same ID. So we would end up with the following data frame:

  ID Flag  [Explain]
1  1 NULL   
2  1 NULL
3  1  FRY   First row for ID 1 where Flag <> NULL, so apply row 3 FRY to all subsequent rows for ID 1
4  1  FRY   Override original row 4 CRY since FRY came first
5  1  FRY   FRY rules for all remaining ID = 1 rows
6  5  CRY   First row for ID 5 where Flag <> NULL, so apply row 1 CRY to all subsequent rows for ID 5
7  5  CRY   CRY rules for all remaining ID = 5 rows
8  5  CRY

How would this be accomplished using dplyr? I've been fiddling with group(), fill(), coalesce(), but am stumbling.

CodePudding user response:

Here's a way to do this with the help of a small helper function.

library(dplyr)

fill_values <- function(x) {
  #get the position of first non-NULL value
  inds <- match(TRUE, x != "NULL")
  #replace all the value from that position till end
  #with the first non-NULL value. 
  x[inds:length(x)] <- x[inds]
  x
}

#apply the function for each group (ID).
df %>%
  group_by(ID) %>%
  mutate(Flag = fill_values(Flag)) %>%
  ungroup

#     ID Flag 
#  <dbl> <chr>
#1     1 NULL 
#2     1 NULL 
#3     1 FRY  
#4     1 FRY  
#5     1 FRY  
#6     5 CRY  
#7     5 CRY  
#8     5 CRY  

CodePudding user response:

Using tidyr::fill and some additional data wrangling you could do:

library(dplyr)
library(tidyr)

df %>% 
  group_by(ID) %>% 
  mutate(Flag = ifelse(Flag != "NULL", first(Flag[Flag != "NULL"]), NA_character_)) %>% 
  fill(Flag) %>% 
  replace_na(list(Flag = "NULL")) %>% 
  ungroup()
#> # A tibble: 8 × 2
#>      ID Flag 
#>   <dbl> <chr>
#> 1     1 NULL 
#> 2     1 NULL 
#> 3     1 FRY  
#> 4     1 FRY  
#> 5     1 FRY  
#> 6     5 CRY  
#> 7     5 CRY  
#> 8     5 CRY

CodePudding user response:

# Base R solution: 
transform(
  df,
  Flag = ave(
    Flag,
    ID,
    FUN = function(x){
      ifelse(
        cumsum(x != "NULL") > 0,
        x[min(which(x != "NULL"))],
        "NULL"
      )
    }
  )
)


# Dplyr solution: 
library(dplyr)
df %>%
  group_by(ID) %>%
  mutate(
    Flag = if_else(
      cumsum(Flag != "NULL") > 0,
      first(Flag[which(Flag != "NULL")]),
      "NULL"
    )
  ) %>%
  ungroup()

# data.table solution: 
library(data.table)
dt <- data.table(df)
dt[, Flag := ifelse(
  cumsum(Flag != "NULL") > 0,
  Flag[min(which(Flag != "NULL"))],
  "NULL"),
  by = ID
]
  •  Tags:  
  • Related