I'm working with Excel files which have both text and numerical data, tracked by an ID variable whose name has a space in it. Something like this:
library(tidyverse)
FirstName <- c("George", "Steven", "Kathryn", "Jane")
LastName <- c("Lucas", "Spielberg", "Bigelow", "Campion")
v3 <- (1:4)
value <- c(123, 456, 789, 012)
x <- tibble(FirstName, LastName, v3, value)
names(x) <- c("FirstName", "LastName", "my ID", "value")
x
# A tibble: 4 × 4
FirstName LastName `my ID` value
<chr> <chr> <int> <dbl>
1 George Lucas 1 123
2 Steven Spielberg 2 456
3 Kathryn Bigelow 3 789
4 Jane Campion 4 12
I want to write a function which both unite the text columns and summarise the numeric ones, like so:
f <- function(theData, theID) {
theOutput <- theData %>%
unite("FullName", FirstName, LastName) %>%
group_by(FullName, theID) %>%
summarise(value, sum)
return(theOutput)
}
But, when I run the function I get an error:
> f(x, `my ID`)
Error: Must group by variables found in `.data`.
* Column `theID` is not found.
The same error is returned if I run f(x, 'my ID'). I assume this problem is related to the whitespace in the ID variable name.
Is there a clean solution to this problem where I could use the variable names as is, or do I have to do something like strip all of the whitespace out of the variable names?
CodePudding user response:
If we are passing unquoted column names (single/double quoted), then use curly-curly operator ({{}}) to evaluate within the tidyverse functions - i.e. group_by step. In addition, there is a syntax error in summarise i.e. the function sum should be on the value column (sum(value))
f <- function(theData, theID) {
theOutput <- theData %>%
unite("FullName", FirstName, LastName) %>%
group_by(FullName, {{theID}}) %>%
summarise(value = sum(value, na.rm = TRUE), .groups = 'drop')
return(theOutput)
}
-testing
f(x, `my ID`)
# A tibble: 4 × 3
FullName `my ID` value
<chr> <int> <dbl>
1 George_Lucas 1 123
2 Jane_Campion 4 12
3 Kathryn_Bigelow 3 789
4 Steven_Spielberg 2 456
NOTE: na.rm = TRUE is added in sum (in case there are any missing value and want to remove it). Also, the .groups = 'drop' added to remove the group attribute (by default, it drops the last group in the order and throws a message in the console)
