Home > Mobile >  dplyr and filter for conditional mutation but return full dataset
dplyr and filter for conditional mutation but return full dataset

Time:01-18

I have been using dplyr and filter to create a conditional mutation based on several factors. However instead of just returning the filtered results, I need this new column to be the same length as the original dataset.

Using the example data below: I want to create a new column called Result and label rows as "pass" for each Condition where Value for at least two distinct Names is greater than 3, otherwise label as "fail". note-I'd like to keep the data in long format.

Example data my effort:

data <- tibble(
  Condition = c(rep("Apple", 20),rep("Banana", 20),rep("Cherry", 20),rep("Pear", 20)),
  Names = c(rep("John", 5),rep("Paul", 5), rep("George", 5), rep("Ringo", 5), rep("John", 5),rep("Paul", 5),rep("George", 5),rep("Ringo", 5), rep("John", 5),rep("Paul", 5),rep("George", 5),rep("Ringo", 5),rep("John", 5),rep("Paul", 5),rep("George", 5), rep("Ringo", 5)),
  Value = c(rep(3, 5), rep(2, 5), rep(1, 5), rep(2, 5),rep(5, 5),rep(3, 5), rep(3, 5),rep(4, 5), rep(4, 5),rep(2, 5),rep(2, 5),rep(6, 5),rep(2, 5),rep(5, 5),rep(1, 5),rep(1, 5)))

x <- data %>%
  filter(Value >= 3) %>%
  group_by(Condition) %>%
  mutate(Result = ifelse(n_distinct(Names) >1, "pass", "fail"))

What I'm after:

desired <- tibble(
  Condition = c(rep("Apple", 20),rep("Banana", 20),rep("Cherry", 20),rep("Pear", 20)),
  Names = c(rep("John", 5),rep("Paul", 5), rep("George", 5), rep("Ringo", 5), rep("John", 5),rep("Paul", 5),rep("George", 5),rep("Ringo", 5), rep("John", 5),rep("Paul", 5),rep("George", 5),rep("Ringo", 5),rep("John", 5),rep("Paul", 5),rep("George", 5), rep("Ringo", 5)),
  Value = c(rep(3, 5), rep(2, 5), rep(1, 5), rep(2, 5),rep(5, 5),rep(3, 5), rep(3, 5),rep(4, 5), rep(4, 5),rep(2, 5),rep(2, 5),rep(6, 5),rep(2, 5),rep(5, 5),rep(1, 5),rep(1, 5)),
  Result = c(rep('fail',20),rep('pass',20),rep('pass',20),rep('fail',20))
)

Thanks

CodePudding user response:

We can use n_distinct() and subset the data inside the call to n_distinct().

library(dplyr)

output<-data %>%
        group_by(Condition) %>%
        mutate(Result = ifelse(n_distinct(Names[Value>3])>1,
                               'pass',
                               'fail')) %>%
        ungroup


identical(output, desired)
[1] TRUE
  •  Tags:  
  • Related