combine dataset using if else{} in for loop in R-CodePudding

I need a function that removes the factor values of more than two, in this case, cyl. Must take the mean of the numeric values and the prop.table()=1 for factor values. And at the end, it should create a dataset as the expected answer. Many thanks in advance.

`head(mtcars)
    mtcars$vs <- as.factor(mtcars$vs) 
    mtcars$cyl <- as.factor(mtcars$cyl) # sholuld be removed from the final dataset
    #var <- colnames(mtcars); var
    Summ.Continuous <- tab.prob <- out <- NULL
    
    myfunction <- function(var,df) {
      
      df <- df[, !sapply(df, is.character)] #Remove Character Columns
      
      for (j in 1:ncol(df)) {
        if(is.factor(df[,j])){ 
          tab.prob[j] <- prop.table(table(df[,j]))
          
        } else {
          Summ.Continuous[j] <- describe(df)$mean
          
        }} 
      out <- list(tab.prob, Summ.Continuous)
      return(out)}
    
    myfunction(var, mtcars)

Expected Answer

  mp 20.09 
    cyl NA 
    disp 230.7 
    hp 146.7  
    drat 3.597 
    wt 3.217 
    qsec 17.85 
    vs 0.4375 #prob.table based on 1
    am 0.4062 
    gear 3.688 
    carb 2.812 `

CodePudding user response：

using tidyverse we can conditionally select using where, and summarise conditionally using across, for example:

library(tidyverse)
mtcars %>%
  mutate(vs = as.factor(vs),
         cyl = as.factor(cyl)) %>%
  select(!where(~ is.factor(.x) && levels(.x) > 2)) %>%
  summarise(across(where(is.numeric), mean),
            across(where(is.factor), ~ prop.table(table(.x))[2]))

       mpg     disp       hp     drat      wt     qsec      am   gear   carb     vs
1 20.09062 230.7219 146.6875 3.596563 3.21725 17.84875 0.40625 3.6875 2.8125 0.4375

This can be used together with the tidylog package to inform what happened at each step, here useful to notify that cyl has been dropped from the output.

library(tidylog)
~ previous code here

mutate: converted 'cyl' from double to factor (0 new NA)
        converted 'vs' from double to factor (0 new NA)
select: dropped one variable (cyl)
summarise: now one row and 10 columns, ungrouped