I have a df below, I need to calculate Pass percent after excluding - rows and the average values of pred1 and pred2:
df <- data.frame(
name = c('A', 'B', 'C', 'D', 'E'),
status = c('Pass', 'Fail', '-', 'Pass', 'Pass'),
real = c(10, NA, 8, 9, 4),
pred1 = c(50, 20, NA, 14, 11),
pred2 = c(12, 12, 8, NA, 6)
)
df:
name status real pred1 pred2
1 A Pass 10 50 12
2 B Fail NA 20 12
3 C - 8 NA 8
4 D Pass 9 14 NA
5 E Pass 4 11 6
The expected result:
name status real pred1 pred2
1 A Pass 10 50 12
2 B Fail NA 20 12
3 C - 8 NA 8
4 D Pass 9 14 NA
5 E Pass 4 11 6
6 total 0.75 NA 23.75 9.5
I thought to bind to the result below to df, but it's not concise and beautiful solutions:
pass_percent <- nrow(df %>% filter(status == 'Pass')) / nrow(df %>% filter(status != '-'))
avg_pred1 <- mean(df$pred1, na.rm = T)
avg_pred2 <- mean(df$pred2, na.rm = T)
How could I acheive that in a more concise way with R's pipe?
CodePudding user response:
What about tibble::add_row:
df %>%
add_row(name = "total",
status = as.character(mean(df$status[df$status != "-"] == "Pass")),
real = mean(df$real),
pred1 = mean(df$pred1, na.rm = T),
pred2 = mean(df$pred2, na.rm = T))
name status real pred1 pred2
1 A Pass 10 50.00 12.0
2 B Fail NA 20.00 12.0
3 C - 8 NA 8.0
4 D Pass 9 14.00 NA
5 E Pass 4 11.00 6.0
6 total 0.75 NA 23.75 9.5
Explanation of as.character(mean(df$status[df$status != "-"] == "Pass")):
df$status[df$status != "-"]is the vector ofdf$statuswithout the element equal to"-"(so onlyPassandFail).df$status[df$status != "-"] == "Pass"isTRUEifdf$statusis"Pass",FALSEotherwise.mean(...)is possible because TRUE and FALSE values are coerced to numeric when the mean is computed.as.character(...)is needed becausedf$statusis a character variable.
