I have been using dplyr and filter to create a conditional mutation based on several factors. However instead of just returning the filtered results, I need this new column to be the same length as the original dataset.
Using the example data below: I want to create a new column called Result and label rows as "pass" for each Condition where Value for at least two distinct Names is greater than 3, otherwise label as "fail". note-I'd like to keep the data in long format.
Example data my effort:
data <- tibble(
Condition = c(rep("Apple", 20),rep("Banana", 20),rep("Cherry", 20),rep("Pear", 20)),
Names = c(rep("John", 5),rep("Paul", 5), rep("George", 5), rep("Ringo", 5), rep("John", 5),rep("Paul", 5),rep("George", 5),rep("Ringo", 5), rep("John", 5),rep("Paul", 5),rep("George", 5),rep("Ringo", 5),rep("John", 5),rep("Paul", 5),rep("George", 5), rep("Ringo", 5)),
Value = c(rep(3, 5), rep(2, 5), rep(1, 5), rep(2, 5),rep(5, 5),rep(3, 5), rep(3, 5),rep(4, 5), rep(4, 5),rep(2, 5),rep(2, 5),rep(6, 5),rep(2, 5),rep(5, 5),rep(1, 5),rep(1, 5)))
x <- data %>%
filter(Value >= 3) %>%
group_by(Condition) %>%
mutate(Result = ifelse(n_distinct(Names) >1, "pass", "fail"))
What I'm after:
desired <- tibble(
Condition = c(rep("Apple", 20),rep("Banana", 20),rep("Cherry", 20),rep("Pear", 20)),
Names = c(rep("John", 5),rep("Paul", 5), rep("George", 5), rep("Ringo", 5), rep("John", 5),rep("Paul", 5),rep("George", 5),rep("Ringo", 5), rep("John", 5),rep("Paul", 5),rep("George", 5),rep("Ringo", 5),rep("John", 5),rep("Paul", 5),rep("George", 5), rep("Ringo", 5)),
Value = c(rep(3, 5), rep(2, 5), rep(1, 5), rep(2, 5),rep(5, 5),rep(3, 5), rep(3, 5),rep(4, 5), rep(4, 5),rep(2, 5),rep(2, 5),rep(6, 5),rep(2, 5),rep(5, 5),rep(1, 5),rep(1, 5)),
Result = c(rep('fail',20),rep('pass',20),rep('pass',20),rep('fail',20))
)
Thanks
CodePudding user response:
We can use n_distinct() and subset the data inside the call to n_distinct().
library(dplyr)
output<-data %>%
group_by(Condition) %>%
mutate(Result = ifelse(n_distinct(Names[Value>3])>1,
'pass',
'fail')) %>%
ungroup
identical(output, desired)
[1] TRUE
