Suppose we start with the data frame df shown below:
ID Flag
1 1 NULL
2 1 NULL
3 1 FRY
4 1 CRY
5 1 NULL
6 5 CRY
7 5 NULL
8 5 NULL
ID <- c(1, 1, 1, 1,1, 5, 5, 5)
Flag <- c("NULL","NULL","FRY","CRY","NULL","CRY","NULL","NULL")
df <- data.frame(ID, Flag)
df
I would like to change the "Flag" column so that the first time a Flag row is not "NULL" for a given ID, then that non-NULL item is copied down to all remaining rows for that same ID. So we would end up with the following data frame:
ID Flag [Explain]
1 1 NULL
2 1 NULL
3 1 FRY First row for ID 1 where Flag <> NULL, so apply row 3 FRY to all subsequent rows for ID 1
4 1 FRY Override original row 4 CRY since FRY came first
5 1 FRY FRY rules for all remaining ID = 1 rows
6 5 CRY First row for ID 5 where Flag <> NULL, so apply row 1 CRY to all subsequent rows for ID 5
7 5 CRY CRY rules for all remaining ID = 5 rows
8 5 CRY
How would this be accomplished using dplyr? I've been fiddling with group(), fill(), coalesce(), but am stumbling.
CodePudding user response:
Here's a way to do this with the help of a small helper function.
library(dplyr)
fill_values <- function(x) {
#get the position of first non-NULL value
inds <- match(TRUE, x != "NULL")
#replace all the value from that position till end
#with the first non-NULL value.
x[inds:length(x)] <- x[inds]
x
}
#apply the function for each group (ID).
df %>%
group_by(ID) %>%
mutate(Flag = fill_values(Flag)) %>%
ungroup
# ID Flag
# <dbl> <chr>
#1 1 NULL
#2 1 NULL
#3 1 FRY
#4 1 FRY
#5 1 FRY
#6 5 CRY
#7 5 CRY
#8 5 CRY
CodePudding user response:
Using tidyr::fill and some additional data wrangling you could do:
library(dplyr)
library(tidyr)
df %>%
group_by(ID) %>%
mutate(Flag = ifelse(Flag != "NULL", first(Flag[Flag != "NULL"]), NA_character_)) %>%
fill(Flag) %>%
replace_na(list(Flag = "NULL")) %>%
ungroup()
#> # A tibble: 8 × 2
#> ID Flag
#> <dbl> <chr>
#> 1 1 NULL
#> 2 1 NULL
#> 3 1 FRY
#> 4 1 FRY
#> 5 1 FRY
#> 6 5 CRY
#> 7 5 CRY
#> 8 5 CRY
CodePudding user response:
# Base R solution:
transform(
df,
Flag = ave(
Flag,
ID,
FUN = function(x){
ifelse(
cumsum(x != "NULL") > 0,
x[min(which(x != "NULL"))],
"NULL"
)
}
)
)
# Dplyr solution:
library(dplyr)
df %>%
group_by(ID) %>%
mutate(
Flag = if_else(
cumsum(Flag != "NULL") > 0,
first(Flag[which(Flag != "NULL")]),
"NULL"
)
) %>%
ungroup()
# data.table solution:
library(data.table)
dt <- data.table(df)
dt[, Flag := ifelse(
cumsum(Flag != "NULL") > 0,
Flag[min(which(Flag != "NULL"))],
"NULL"),
by = ID
]
