I'm struggling to create a function in R that uses data.frame's variable's names as part of its arguments.
Say for example that I have this data
test.df <-
data.frame(
variable_1 = sample(letters[1:4],10, replace = T),
variable_2 = rnorm(10,10,3),
variable_3 = rnorm(10,40,15))
test.df
variable_1 variable_2 variable_3
1 c 5.514034 59.23525
2 a 10.515690 31.94552
3 d 11.845118 47.39481
4 c 8.481335 22.32198
5 d 7.945798 29.02631
6 c 9.631182 41.90519
7 c 9.348816 53.79478
8 a 4.559642 58.47290
9 d 9.876674 53.53151
10 c 12.955443 49.84759
And I need to create a function which accesses any given variable by its name and, for example, extracts and reports it's mean in the form 'The mean is: X' (where 'X' contains the mean value). So far I've tried this:
my.function <- function(df, variable) {
paste0("The mean is: ",
round(mean(df$variable),2))
}
But when evaluating my.function in 'my test.df' it shows that is clearly doing the job:
> my.function(test.df, variable_2)
[1] "The mean of the varibale is: NA"
So my questions are:
Hoy do I call variables names inside a funtion's argument? I know there is various ways to do this since outhere thare ere other libraries that for example uses either
variable_2or"variable_2", or when needing more than one variable, either list variables without quotations just separating them by commas (variable_2, variable_3as indplyr::select()), or one has to place target variables as character groups (c("variable_2", "variable_3")as inreshape2::melt())BONUS: I really like when using functions that require more than one variable, you can press tab, and the list of available variables shows up (as in
dplyr::select()for example). How do I get this feature when building my own functions?
Thanks in advance! :)
CodePudding user response:
If we are passing unquoted argument for column names, then convert to string with deparse/substitute and use [[ instead of $. Also, create a condition to check if the value from substitute is symbol, then use deparse so that it can pass both quoted and unquoted
my.function <- function(df, variable) {
variable <- substitute(variable)
if(is.symbol(variable)) variable <- deparse(variable)
paste0("The mean is: ",
round(mean(df[[variable]], na.rm = TRUE),2))
}
-testing
> my.function(test.df, variable_2)
[1] "The mean is: 9.86"
> my.function(test.df, "variable_2")
[1] "The mean is: 9.86"
If we want to get the mean of multiple columns, use colMeans and pass the variable as a character vector
my.function <- function(df, variable) {
v1 <- colMeans(df[variable], na.rm = TRUE)
sprintf("The mean of %s: %f", names(v1), v1)
}
-testing
> my.function(test.df, c("variable_2", "variable_3"))
[1] "The mean of variable_2: 9.860057" "The mean of variable_3: 42.317997"
CodePudding user response:
Instead of df$nameOfColumn, you can use:
column <- "nameOfColumn"
df[[column]]
Example:
my.function <- function(df, variable) {
paste0("The mean is: ",
round(mean(df[[variable]]),2))
}
> my.function(test.df, "variable_2")
[1] "The mean is: 11.88"
This can be found in the R Language Definition under Indexing
