I have a large data frame with including several trials. I want a new column with the best trial out of the three. They are strings and should be ordered as good, fair and bad.
df <- data.frame(id = c(1:5),
filler = c("You", "can", "ignore", "this", "one"),
trial1 = c("good", NA, "good", "fair", NA),
trial2 = c("bad", "bad", "fair", "NA", NA),
trial3 = c("bad", "bad", "bad", "bad", NA))
In this example, this would mean for the new column and each ID: good, bad, good, fair, NA
CodePudding user response:
We may use rowwise to loop over the rows, convert the columns that starts_with to factor with levels specified in the custom order and extract the first of the levels after dropping the levels (droplevels)
library(dplyr)
df <- df %>%
rowwise %>%
mutate(new = first(levels(droplevels(factor(c_across(starts_with('trial')),
levels = c('good', 'fair', 'bad')))))) %>%
ungroup
-output
df
# A tibble: 5 × 6
id filler trial1 trial2 trial3 new
<int> <chr> <chr> <chr> <chr> <chr>
1 1 You good bad bad good
2 2 can <NA> bad bad bad
3 3 ignore good fair bad good
4 4 this fair NA bad fair
5 5 one <NA> <NA> <NA> <NA>
Or using pmap
library(purrr)
df %>%
mutate(new = pmap_chr(across(starts_with('trial')),
~ .x[na.omit(match(c('good', 'fair', 'bad'), .x))[1]]))
id filler trial1 trial2 trial3 new
1 1 You good bad bad good
2 2 can <NA> bad bad <NA>
3 3 ignore good fair bad good
4 4 this fair NA bad fair
5 5 one <NA> <NA> <NA> <NA>
Or using collapse
library(collapse)
df$new <- dapply(gvr(df, 'trial\\d '), MARGIN = 1,
FUN = \(x) x[na.omit(match(c('good', 'fair', 'bad'), x))[1]])
CodePudding user response:
Here is a way solving it with pivoting combined with a case_when statement:
library(tidyr)
library(dplyr)
df %>%
pivot_longer(starts_with("trial")) %>%
group_by(id) %>%
mutate(new = case_when("good" %in% value ~ "good",
"fair" %in% value &
!("good" %in% value) ~ "fair",
"bad" %in% value &
!("fair" %in% value) &
!("good" %in% value) ~ "bad",
TRUE ~ NA_character_)) %>%
pivot_wider() %>%
ungroup()
id filler new trial1 trial2 trial3
<int> <chr> <chr> <chr> <chr> <chr>
1 1 You good good bad bad
2 2 can bad NA bad bad
3 3 ignore good good fair bad
4 4 this fair fair NA bad
5 5 one NA NA NA NA
CodePudding user response:
Here's a base R way to do this -
cols <- grep('trial', names(df))
df[cols] <- lapply(df[cols], factor, c('bad', 'fair', 'good'), ordered = TRUE)
df$out <- do.call(pmax, c(df[cols], na.rm = TRUE))
df
# id filler trial1 trial2 trial3 out
#1 1 You good bad bad good
#2 2 can <NA> bad bad bad
#3 3 ignore good fair bad good
#4 4 this fair <NA> bad fair
#5 5 one <NA> <NA> <NA> <NA>
We first change the trial columns to ordered factors and then take rowwise maximum of them to get the max value in each row.
