I need a function that removes the factor values of more than two, in this case, cyl. Must take the mean of the numeric values and the prop.table()=1 for factor values. And at the end, it should create a dataset as the expected answer. Many thanks in advance.
`head(mtcars)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$cyl <- as.factor(mtcars$cyl) # sholuld be removed from the final dataset
#var <- colnames(mtcars); var
Summ.Continuous <- tab.prob <- out <- NULL
myfunction <- function(var,df) {
df <- df[, !sapply(df, is.character)] #Remove Character Columns
for (j in 1:ncol(df)) {
if(is.factor(df[,j])){
tab.prob[j] <- prop.table(table(df[,j]))
} else {
Summ.Continuous[j] <- describe(df)$mean
}}
out <- list(tab.prob, Summ.Continuous)
return(out)}
myfunction(var, mtcars)
Expected Answer
mp 20.09
cyl NA
disp 230.7
hp 146.7
drat 3.597
wt 3.217
qsec 17.85
vs 0.4375 #prob.table based on 1
am 0.4062
gear 3.688
carb 2.812 `
CodePudding user response:
using tidyverse we can conditionally select using where, and summarise conditionally using across, for example:
library(tidyverse)
mtcars %>%
mutate(vs = as.factor(vs),
cyl = as.factor(cyl)) %>%
select(!where(~ is.factor(.x) && levels(.x) > 2)) %>%
summarise(across(where(is.numeric), mean),
across(where(is.factor), ~ prop.table(table(.x))[2]))
mpg disp hp drat wt qsec am gear carb vs
1 20.09062 230.7219 146.6875 3.596563 3.21725 17.84875 0.40625 3.6875 2.8125 0.4375
This can be used together with the tidylog package to inform what happened at each step, here useful to notify that cyl has been dropped from the output.
library(tidylog)
~ previous code here
mutate: converted 'cyl' from double to factor (0 new NA)
converted 'vs' from double to factor (0 new NA)
select: dropped one variable (cyl)
summarise: now one row and 10 columns, ungrouped
