I have this dataframe:
df1 <- structure(list(ID = c(1, 2, 2, 2, 3, 4, 5, 6, 6, 7, 8, 8, 9,
10), dateA = structure(c(14974, 18628, 18628, 18628, 14882, 16800,
14882, 17835, 17835, 16832, 16556, 16556, 15949, 16801), class = "Date"),
dateB = structure(c(14610, 15340, 15706, 17501, 14730, NA,
14700, 16191, 17106, 16801, 15810, 16436, 14655, 15431), class = "Date"),
dateC = structure(c(18628, 15705, 17500, 18628, 18628, NA,
18628, 17105, 18628, 18628, 16435, 16556, 15706, 18628), class = "Date")), row.names = c(NA,
-14L), class = c("data.table", "data.frame"))
ID dateA dateB dateC
1: 1 2010-12-31 2010-01-01 2021-01-01
2: 2 2021-01-01 2012-01-01 2012-12-31
3: 2 2021-01-01 2013-01-01 2017-11-30
4: 2 2021-01-01 2017-12-01 2021-01-01
5: 3 2010-09-30 2010-05-01 2021-01-01
6: 4 2015-12-31 <NA> <NA>
7: 5 2010-09-30 2010-04-01 2021-01-01
8: 6 2018-10-31 2014-05-01 2016-10-31
9: 6 2018-10-31 2016-11-01 2021-01-01
10: 7 2016-02-01 2016-01-01 2021-01-01
11: 8 2015-05-01 2013-04-15 2014-12-31
12: 8 2015-05-01 2015-01-01 2015-05-01
13: 9 2013-09-01 2010-02-15 2013-01-01
14: 10 2016-01-01 2012-04-01 2021-01-01
I would like to check if dateA is in the interval of dateB and dateC: My code:
library(dplyr)
df1 %>%
mutate(match= ifelse(between(dateA, dateB, dateC), 1, 0))
gives:
Error: Problem with `mutate()` column `match`.
i `match = ifelse(between(dateA, dateB, dateC), 1, 0)`.
x Not yet implemented NAbounds=TRUE for this non-numeric and non-character type
If I remove the row that contains NA the code works:
df1 %>%
slice(-6) %>%
mutate(match= ifelse(between(dateA, dateB, dateC), 1, 0))
I want to know, can I leave the row with NA and perform my code?
CodePudding user response:
There is a confusion as to which between the OP was using because the input object is data.table and the code used is dplyr. So, if we assume both packages are loaded, there is a between function in each of those packages and depending on which package got loaded last, the between from the previous package will get masked. If use dplyr::between, it is not vectorized fully and it is documented in the ?dplyr::between
left, right Boundary values (must be scalars).
df1 %>%
rowwise %>%
mutate(match = (dplyr::between(dateA, dateB, dateC))) %>%
ungroup
-output
# A tibble: 14 × 5
ID dateA dateB dateC match
<dbl> <date> <date> <date> <int>
1 1 2010-12-31 2010-01-01 2021-01-01 1
2 2 2021-01-01 2012-01-01 2012-12-31 0
3 2 2021-01-01 2013-01-01 2017-11-30 0
4 2 2021-01-01 2017-12-01 2021-01-01 1
5 3 2010-09-30 2010-05-01 2021-01-01 1
6 4 2015-12-31 NA NA NA
7 5 2010-09-30 2010-04-01 2021-01-01 1
8 6 2018-10-31 2014-05-01 2016-10-31 0
9 6 2018-10-31 2016-11-01 2021-01-01 1
10 7 2016-02-01 2016-01-01 2021-01-01 1
11 8 2015-05-01 2013-04-15 2014-12-31 0
12 8 2015-05-01 2015-01-01 2015-05-01 1
13 9 2013-09-01 2010-02-15 2013-01-01 0
14 10 2016-01-01 2012-04-01 2021-01-01 1
However, it is not the case with ?data.table::between (based on the error showed in OP's post, it seems that the between used was from data.table,
lower - Lower range bound. Either length 1 or same length as x.
upper - Upper range bound. Either length 1 or same length as x.
but class can be an issue though it says otherwise
x- Any orderable vector, i.e., those with relevant methods for
<=, such as numeric, character, Date, etc. in case of between and a numeric vector in case of inrange.
Convert to integer/numeric from Date class and it should work
df1 %>%
mutate(match = (data.table::between(as.numeric(dateA),
as.numeric(dateB), as.numeric(dateC))))
-output
ID dateA dateB dateC match
1: 1 2010-12-31 2010-01-01 2021-01-01 1
2: 2 2021-01-01 2012-01-01 2012-12-31 0
3: 2 2021-01-01 2013-01-01 2017-11-30 0
4: 2 2021-01-01 2017-12-01 2021-01-01 1
5: 3 2010-09-30 2010-05-01 2021-01-01 1
6: 4 2015-12-31 <NA> <NA> 1
7: 5 2010-09-30 2010-04-01 2021-01-01 1
8: 6 2018-10-31 2014-05-01 2016-10-31 0
9: 6 2018-10-31 2016-11-01 2021-01-01 1
10: 7 2016-02-01 2016-01-01 2021-01-01 1
11: 8 2015-05-01 2013-04-15 2014-12-31 0
12: 8 2015-05-01 2015-01-01 2015-05-01 1
13: 9 2013-09-01 2010-02-15 2013-01-01 0
14: 10 2016-01-01 2012-04-01 2021-01-01 1
By diving deeper, the issue is in the argument NAbounds which is TRUE by default. In the OP's data, there is a single NA element
df1 %>%
mutate(match = data.table::between(dateA, dateB, dateC))
Error: Problem with
mutate()columnmatch. ℹmatch = data.table::between(dateA, dateB, dateC). ✖ Not yet implemented NAbounds=TRUE for this non-numeric and non-character type Runrlang::last_error()to see where the error occurred.
We may need to set this to FALSE
df1 %>%
mutate(match = (data.table::between(dateA, dateB, dateC, NAbounds = FALSE)))
ID dateA dateB dateC match
1: 1 2010-12-31 2010-01-01 2021-01-01 1
2: 2 2021-01-01 2012-01-01 2012-12-31 0
3: 2 2021-01-01 2013-01-01 2017-11-30 0
4: 2 2021-01-01 2017-12-01 2021-01-01 1
5: 3 2010-09-30 2010-05-01 2021-01-01 1
6: 4 2015-12-31 <NA> <NA> NA
7: 5 2010-09-30 2010-04-01 2021-01-01 1
8: 6 2018-10-31 2014-05-01 2016-10-31 0
9: 6 2018-10-31 2016-11-01 2021-01-01 1
10: 7 2016-02-01 2016-01-01 2021-01-01 1
11: 8 2015-05-01 2013-04-15 2014-12-31 0
12: 8 2015-05-01 2015-01-01 2015-05-01 1
13: 9 2013-09-01 2010-02-15 2013-01-01 0
14: 10 2016-01-01 2012-04-01 2021-01-01 1
Or may also do a conversion on the NA with as.Date
df1 %>%
mutate(match = (data.table::between(dateA, dateB, dateC,
NAbounds = as.Date(NA))))
ID dateA dateB dateC match
1: 1 2010-12-31 2010-01-01 2021-01-01 1
2: 2 2021-01-01 2012-01-01 2012-12-31 0
3: 2 2021-01-01 2013-01-01 2017-11-30 0
4: 2 2021-01-01 2017-12-01 2021-01-01 1
5: 3 2010-09-30 2010-05-01 2021-01-01 1
6: 4 2015-12-31 <NA> <NA> NA
7: 5 2010-09-30 2010-04-01 2021-01-01 1
8: 6 2018-10-31 2014-05-01 2016-10-31 0
9: 6 2018-10-31 2016-11-01 2021-01-01 1
10: 7 2016-02-01 2016-01-01 2021-01-01 1
11: 8 2015-05-01 2013-04-15 2014-12-31 0
12: 8 2015-05-01 2015-01-01 2015-05-01 1
13: 9 2013-09-01 2010-02-15 2013-01-01 0
14: 10 2016-01-01 2012-04-01 2021-01-01 1
