I am attempting to run a wilcox test across a data frame within a script, without having to manually type all of the variables I am trying to compare before I run the script.
So far it seems that the as.formula function will only define the first of the many variables I'm attempting to examine, such that if I input:
n <- names(df)
f <- as.formula(paste(n[!n %in% "cluster"], paste("~ cluster", collapse = " ")))
f
I get the first variable ~ cluster, and the error:
Using formula(x) is deprecated when x is a character vector of length > 1.
Consider formula(paste(x, collapse = " ")) instead.
I was wondering if anyone knew how to run this in "reverse", such that I get all of my variables ~ cluster within a function. If I type them all manually (formula = c(x1, x2, x3 ...) ~ cluster) and run the wilcox test, I get the appropriate output. I just am trying to define them without doing that manually.
CodePudding user response:
If you didn't mean:
as.formula(paste(paste(setdiff(n, 'cluster'), collapse=' '), '~ cluster'))
# x1 x2 x3 ~ cluster
you could use lapply and setdiff.
foo <- lapply(setdiff(n, 'cluster'), \(x) as.formula(paste(x, '~ cluster')))
foo
# [[1]]
# x1 ~ cluster
# <environment: 0x55b1ca157078>
#
# [[2]]
# x2 ~ cluster
# <environment: 0x55b1ca159708>
#
# [[3]]
# x3 ~ cluster
# <environment: 0x55b1ca1eed50>
Later, subset the list,
wilcox.test(foo[[1]], data)
or even:
lapply(foo, \(f) wilcox.test(f, data))
Note: R >= 4.1 used.
Data:
n <- c(paste0('x', 1:3), 'cluster')
