My data are as follows:
year group date
2019 A 2019-07-15
2019 A 2019-07-25
2019 A 2019-08-01
2019 B 2019-07-15
2019 B 2019-07-30
2020 A 2020-08-01
2020 A 2020-08-03
2020 B 2020-08-01
2020 B 2020-08-20
2020 B 2020-08-25
I would like to calculate the mean number of days between dates per year per group. I have tried the following code and receive the following error:
data_meandays <- data %>%
group_by(year, group)%>%
mutate(Difference = date - lag(date)) %>%
summarize(mean_time = mean(Difference, na.rm=TRUE))
Error in date - lag(date) :
non-numeric argument to binary operator
The class of my date column is Date.
Thank you in advance!
CodePudding user response:
The error occurred because the date column is character and not Date class. We need to convert to Date class before doing the difference
library(dplyr)
data %>%
mutate(date = as.Date(date)) %>%
group_by(year, group) %>%
mutate(Difference = date - lag(date)) %>%
summarize(mean_time = mean(Difference, na.rm=TRUE), .groups = 'drop')
-output
# A tibble: 4 × 3
year group mean_time
<int> <chr> <drtn>
1 2019 A 8.5 days
2 2019 B 15.0 days
3 2020 A 2.0 days
4 2020 B 12.0 days
NOTE: the output from the difference between dates are difftime objects. If we want to convert to numeric class, it would be as.numeric applied on the column
The OP's error can be reproduced if we don't convert to Date class
data %>%
group_by(year, group)%>%
mutate(Difference = date - lag(date)) %>%
summarize(mean_time = mean(Difference, na.rm=TRUE))
Error in
mutate(): ! Problem while computingDifference = date - lag(date). ℹ The error occurred in group 1: year = 2019, group = "A". Caused by error indate - lag(date): ! non-numeric argument to binary operator Runrlang::last_error()to see where the error occurred
data
data <- structure(list(year = c(2019L, 2019L, 2019L, 2019L, 2019L, 2020L,
2020L, 2020L, 2020L, 2020L), group = c("A", "A", "A", "B", "B",
"A", "A", "B", "B", "B"), date = c("2019-07-15", "2019-07-25",
"2019-08-01", "2019-07-15", "2019-07-30", "2020-08-01", "2020-08-03",
"2020-08-01", "2020-08-20", "2020-08-25")),
class = "data.frame", row.names = c(NA,
-10L))
