I'm trying to write a function which returns specific details about outliers (only sex, age, education, and the outlying value). I need to do it with many parameters, so I would like to transfer name of column to the function. Is there a way to do it?
For example, this code should return: f, 27, 12, 110.
my_data= data.frame( sex= c("f", "m", "f", "f", "m"),
age= c(22, 30, 24, 27, 30),
eduyears= c(12,16, 15, 12, 17),
weight= c(53, 70, 60, 110, 75),
height= c(160, 183, 157, 168, 180))
find_outliers= function (my_data, colname) {
out_values= boxplot.stats(my_data$colname)$out
out_ind= which(my_data$colname %in% out_values) #find outliers indices
outliers= my_data[out_ind ,c("sex","age","eduyears", colname)]
return (outliers)
}
find_outliers(weight)
CodePudding user response:
If the function has two arguments you need to pass them both in its call, you are only passing one, weight. And passing as an unquoted variable means the function must get the column name as a character string in order to access it.
Finally, see the famous question on how to Dynamically select data frame columns using $ and a vector of column names.
my_data <- data.frame(sex = c("f", "m", "f", "f", "m"),
age = c(22, 30, 24, 27, 30),
eduyears = c(12,16, 15, 12, 17),
weight = c(53, 70, 60, 110, 75),
height = c(160, 183, 157, 168, 180))
find_outliers <- function (my_data, colname) {
# get the colname as a character string
colname <- as.character(substitute(colname))
out_values <- boxplot.stats(my_data[[colname]])$out
out_ind <- which(my_data[[colname]] %in% out_values) #find outliers indices
outliers <- my_data[out_ind, c("sex","age","eduyears", colname)]
outliers
}
find_outliers(my_data, weight)
#> sex age eduyears weight
#> 4 f 27 12 110
my_data |> find_outliers(weight)
#> sex age eduyears weight
#> 4 f 27 12 110
Created on 2022-11-05 with reprex v2.0.2
