Home > Enterprise >  Find best-of value in string over multiple columns
Find best-of value in string over multiple columns

Time:01-23

I have a large data frame with including several trials. I want a new column with the best trial out of the three. They are strings and should be ordered as good, fair and bad.

df <- data.frame(id = c(1:5),
                 filler = c("You", "can", "ignore", "this", "one"),
                 trial1 = c("good", NA, "good", "fair", NA),
                 trial2 = c("bad", "bad", "fair", "NA", NA),
                 trial3 = c("bad", "bad", "bad", "bad", NA))

In this example, this would mean for the new column and each ID: good, bad, good, fair, NA

CodePudding user response:

We may use rowwise to loop over the rows, convert the columns that starts_with to factor with levels specified in the custom order and extract the first of the levels after dropping the levels (droplevels)

library(dplyr)
df <- df %>% 
  rowwise %>% 
  mutate(new = first(levels(droplevels(factor(c_across(starts_with('trial')),
       levels = c('good', 'fair', 'bad')))))) %>%
  ungroup

-output

df
# A tibble: 5 × 6
     id filler trial1 trial2 trial3 new  
  <int> <chr>  <chr>  <chr>  <chr>  <chr>
1     1 You    good   bad    bad    good 
2     2 can    <NA>   bad    bad    bad  
3     3 ignore good   fair   bad    good 
4     4 this   fair   NA     bad    fair 
5     5 one    <NA>   <NA>   <NA>   <NA> 

Or using pmap

library(purrr)
df %>% 
   mutate(new = pmap_chr(across(starts_with('trial')),
      ~ .x[na.omit(match(c('good', 'fair', 'bad'), .x))[1]]))
id filler trial1 trial2 trial3  new
1  1    You   good    bad    bad good
2  2    can   <NA>    bad    bad <NA>
3  3 ignore   good   fair    bad good
4  4   this   fair     NA    bad fair
5  5    one   <NA>   <NA>   <NA> <NA>

Or using collapse

library(collapse)
df$new <- dapply(gvr(df, 'trial\\d '), MARGIN = 1,
    FUN = \(x) x[na.omit(match(c('good', 'fair', 'bad'), x))[1]])

CodePudding user response:

Here is a way solving it with pivoting combined with a case_when statement:

library(tidyr)
library(dplyr)

df %>% 
  pivot_longer(starts_with("trial")) %>% 
  group_by(id) %>% 
  mutate(new = case_when("good" %in% value ~ "good",
                         "fair" %in% value & 
                           !("good" %in% value) ~ "fair",
                         "bad" %in% value & 
                           !("fair" %in% value) & 
                           !("good" %in% value) ~ "bad",
                         TRUE ~ NA_character_)) %>% 
  pivot_wider() %>% 
  ungroup()
     id filler new   trial1 trial2 trial3
  <int> <chr>  <chr> <chr>  <chr>  <chr> 
1     1 You    good  good   bad    bad   
2     2 can    bad   NA     bad    bad   
3     3 ignore good  good   fair   bad   
4     4 this   fair  fair   NA     bad   
5     5 one    NA    NA     NA     NA   

CodePudding user response:

Here's a base R way to do this -

cols <- grep('trial', names(df))
df[cols] <- lapply(df[cols], factor, c('bad', 'fair', 'good'), ordered = TRUE)
df$out <- do.call(pmax, c(df[cols], na.rm = TRUE))
df

#  id filler trial1 trial2 trial3  out
#1  1    You   good    bad    bad good
#2  2    can   <NA>    bad    bad  bad
#3  3 ignore   good   fair    bad good
#4  4   this   fair   <NA>    bad fair
#5  5    one   <NA>   <NA>   <NA> <NA>

We first change the trial columns to ordered factors and then take rowwise maximum of them to get the max value in each row.

  •  Tags:  
  • Related