Home > Back-end >  If I have a DataFrame with a number of columns, how do I count variables that exceed a threshold?
If I have a DataFrame with a number of columns, how do I count variables that exceed a threshold?

Time:02-02

So say I have a dataframe like this:

      variable1  variable2   variable3
2    0.58955148 0.56320222  0.98544012
3    0.33730801 0.65952594  2.27159478
4    0.99988849 2.55988180  0.34683483
5    1.47543636 0.43682811  0.71149259

And I want a function like this: filter <- function(dataframe, threshold){} Where applied could look like this: filter(dataframe, c(1, 1, 1) Then I return a new frame where it counts the amount of variables that exceeds the given threshold, like this:

   Variable  NbrOfExeeding  MeanOfCorrect
1  variable1             1           Mean
2  variable2             1           Mean
3  variable3             1           Mean

So the nbr of datapoints exceeding the threshold, and the mean of the rest.

I'm not sure where to start on this. I sort of know what to do but don't know which functions to use. Loop through variable1 and count amount > than threshold. Same for 2 and 3. Get the mean of variable1 omitting number > than threshold. Same for 2 and 3. Get in dataframe. But again, how exactly?

CodePudding user response:

A possible solution:

library(tidyverse)

df <- data.frame(
  variable1 = c(0.58955148, 0.33730801, 0.99988849, 1.47543636),
  variable2 = c(0.56320222, 0.65952594, 2.5598818, 0.43682811),
  variable3 = c(0.98544012, 2.27159478, 0.34683483, 0.71149259)
)

thresholds <- c(1,1,1)

filt <- function(df, thresholds)
{  
  df %>% 
    pivot_longer(cols = everything(), names_to = "variable") %>% 
    group_by(variable) %>% 
    summarise(NbrOfExeeding = sum(value > thresholds[cur_group_id()]) ,
              MeanOfCorrect = mean(value[value <= thresholds[cur_group_id()]]))
}

filt(df, thresholds)

#> # A tibble: 3 × 3
#>   variable  NbrOfExeeding MeanOfCorrect
#>   <chr>             <int>         <dbl>
#> 1 variable1             1         0.642
#> 2 variable2             1         0.553
#> 3 variable3             1         0.681

CodePudding user response:

With base R:

f <- function(dataframe, threshold){
  NbrOfExeeding <- mapply(\(x, y) sum(x > y), dataframe, threshold)
  MeanOfCorrect <- mapply(\(x, y) mean(x[x <= y]), dataframe, threshold)
  
  df <- data.frame(Variable = names(NbrOfExeeding), NbrOfExeeding, MeanOfCorrect)
  rownames(df) <- NULL
  return(df)
}

output:

f(df, c(1,1,1))

#    Variable NbrOfExeeding MeanOfCorrect
# 1 variable1             1     0.6422493
# 2 variable2             1     0.5531854
# 3 variable3             1     0.6812558

f(df, c(0.5,1,3))
#    Variable NbrOfExeeding MeanOfCorrect
# 1 variable1             3     0.3373080
# 2 variable2             1     0.5531854
# 3 variable3             0     1.0788406
  •  Tags:  
  • Related