Home > Back-end >  Error in passing variable names to a function in R
Error in passing variable names to a function in R

Time:02-02

I'm working with Excel files which have both text and numerical data, tracked by an ID variable whose name has a space in it. Something like this:

library(tidyverse)
FirstName <- c("George", "Steven", "Kathryn", "Jane")
LastName <- c("Lucas", "Spielberg", "Bigelow", "Campion")
v3 <- (1:4)
value <- c(123, 456, 789, 012)

x <- tibble(FirstName, LastName, v3, value)
names(x) <- c("FirstName", "LastName", "my ID", "value")
x

# A tibble: 4 × 4
  FirstName LastName   `my ID` value
  <chr>     <chr>        <int> <dbl>
1 George    Lucas            1   123
2 Steven    Spielberg        2   456
3 Kathryn   Bigelow          3   789
4 Jane      Campion          4    12

I want to write a function which both unite the text columns and summarise the numeric ones, like so:

f <- function(theData, theID) {
  theOutput <- theData %>%
    unite("FullName", FirstName, LastName) %>%
    group_by(FullName, theID) %>%
    summarise(value, sum)

  return(theOutput)
}

But, when I run the function I get an error:

> f(x, `my ID`)
 Error: Must group by variables found in `.data`.
* Column `theID` is not found.

The same error is returned if I run f(x, 'my ID'). I assume this problem is related to the whitespace in the ID variable name.

Is there a clean solution to this problem where I could use the variable names as is, or do I have to do something like strip all of the whitespace out of the variable names?

CodePudding user response:

If we are passing unquoted column names (single/double quoted), then use curly-curly operator ({{}}) to evaluate within the tidyverse functions - i.e. group_by step. In addition, there is a syntax error in summarise i.e. the function sum should be on the value column (sum(value))

f <- function(theData, theID) {
  theOutput <- theData %>%
    unite("FullName", FirstName, LastName) %>%
    group_by(FullName, {{theID}}) %>%
    summarise(value = sum(value, na.rm = TRUE), .groups = 'drop')

  return(theOutput)
}

-testing

  f(x, `my ID`)
# A tibble: 4 × 3
  FullName         `my ID` value
  <chr>              <int> <dbl>
1 George_Lucas           1   123
2 Jane_Campion           4    12
3 Kathryn_Bigelow        3   789
4 Steven_Spielberg       2   456

NOTE: na.rm = TRUE is added in sum (in case there are any missing value and want to remove it). Also, the .groups = 'drop' added to remove the group attribute (by default, it drops the last group in the order and throws a message in the console)

  •  Tags:  
  • Related