So say I have a dataframe like this:
variable1 variable2 variable3
2 0.58955148 0.56320222 0.98544012
3 0.33730801 0.65952594 2.27159478
4 0.99988849 2.55988180 0.34683483
5 1.47543636 0.43682811 0.71149259
And I want a function like this:
filter <- function(dataframe, threshold){}
Where applied could look like this:
filter(dataframe, c(1, 1, 1)
Then I return a new frame where it counts the amount of variables that exceeds the given threshold, like this:
Variable NbrOfExeeding MeanOfCorrect
1 variable1 1 Mean
2 variable2 1 Mean
3 variable3 1 Mean
So the nbr of datapoints exceeding the threshold, and the mean of the rest.
I'm not sure where to start on this. I sort of know what to do but don't know which functions to use. Loop through variable1 and count amount > than threshold. Same for 2 and 3. Get the mean of variable1 omitting number > than threshold. Same for 2 and 3. Get in dataframe. But again, how exactly?
CodePudding user response:
A possible solution:
library(tidyverse)
df <- data.frame(
variable1 = c(0.58955148, 0.33730801, 0.99988849, 1.47543636),
variable2 = c(0.56320222, 0.65952594, 2.5598818, 0.43682811),
variable3 = c(0.98544012, 2.27159478, 0.34683483, 0.71149259)
)
thresholds <- c(1,1,1)
filt <- function(df, thresholds)
{
df %>%
pivot_longer(cols = everything(), names_to = "variable") %>%
group_by(variable) %>%
summarise(NbrOfExeeding = sum(value > thresholds[cur_group_id()]) ,
MeanOfCorrect = mean(value[value <= thresholds[cur_group_id()]]))
}
filt(df, thresholds)
#> # A tibble: 3 × 3
#> variable NbrOfExeeding MeanOfCorrect
#> <chr> <int> <dbl>
#> 1 variable1 1 0.642
#> 2 variable2 1 0.553
#> 3 variable3 1 0.681
CodePudding user response:
With base R:
f <- function(dataframe, threshold){
NbrOfExeeding <- mapply(\(x, y) sum(x > y), dataframe, threshold)
MeanOfCorrect <- mapply(\(x, y) mean(x[x <= y]), dataframe, threshold)
df <- data.frame(Variable = names(NbrOfExeeding), NbrOfExeeding, MeanOfCorrect)
rownames(df) <- NULL
return(df)
}
output:
f(df, c(1,1,1))
# Variable NbrOfExeeding MeanOfCorrect
# 1 variable1 1 0.6422493
# 2 variable2 1 0.5531854
# 3 variable3 1 0.6812558
f(df, c(0.5,1,3))
# Variable NbrOfExeeding MeanOfCorrect
# 1 variable1 3 0.3373080
# 2 variable2 1 0.5531854
# 3 variable3 0 1.0788406
