Home > Back-end >  Converting SQL-Interval-String to minutes using R
Converting SQL-Interval-String to minutes using R

Time:01-15

I am working with R where I have a variable '2 month 3 day 6 hour 70 minute' as string. The variable changes over time and therefore does not have the same length/structure. I need this variable to do a query on a PostgreSQL database by casting it to an interval. This works just fine.

Now I need this interval/string-variable as integer in minutes to do some mathematical calculations.

I thought of using sqldf the following:

library(sqldf)
my_interval = '2 month 3 day 6 hour 70 minute'
interval_minutes <- sqldf(paste("SELECT EXTRACT(EPOCH FROM '",my_interval,"'::INTERVAL)/60"))
interval_minutes_novar <- sqldf("SELECT EXTRACT(EPOCH FROM '2 month 3 day 6 hour 70 minute'::INTERVAL)/60")

but am getting Error: near "FROM": syntax error. From my research I know that sqldf uses SQLite, which does not support EXTRACT().

How can I convert a SQL-Interval to minutes using R?

CodePudding user response:

1) sqldf/gsubfn Using gsubfn replace each word in my_interval with *, the appropriate number of minutes and . Remove any trailing and spaces and then either parse and evaluate mins or substitute mins into the sql statement. There are 365.25 / 12 days in the average month over 4 calendar years, having one leap year, but if you want to get the same answer as PostgreSQL replace 365.25 / 12 with 30, as noted in the comments.

library(sqldf)  # this also pulls in gsubfn

# input
my_interval = '2 month 3 day 6 hour 70 minute'

L <- list(minute = "  ", hour = "*60  ", day = "*60*24  ", 
       month = "*365.25 * 60 * 24 /12  ")
mins <- my_interval |>
  gsubfn(pattern = "\\w ", replacement = L) |> 
  trimws(whitespace = "[  ]")

eval(parse(text = mins))
## [1] 92410

fn$sqldf("select $mins mins")
##    mins
## 1 92410

2) Base R This is a base R solution. Extract the numbers and words into separate vectors, translate the words to the appropriate factors and take their inner product. The discussion about 30 days months in (1) applies here too.

v <- c(minute = 1, hour = 60, day = 60 * 24, month = 365.25 * 60 * 24 /12)
nums <- my_interval |>
  gsub(pattern = "[a-z]", replacement = "") |>
  textConnection() |>
  scan(quiet = TRUE)
words <- my_interval |>
  gsub(pattern = "\\d", replacement = "") |>
  textConnection() |>
  scan(what = "", quiet = TRUE)
sum(v[words] * nums)
## [1] 92410

3) lubridate lubridate duration objects can be used.

library(lubridate)
as.numeric(duration(my_interval), "minute")
## [1] 92410

Although lubridate does not handle 30 day months (and Hadley says it is not planned) we can preprocess my_interval to get the effect.

library(gsubfn)
library(lubridate)

my_interval |>
 gsubfn(pattern = "(\\d )  month", replacement = ~paste(30*as.numeric(x),"day")) |>
 duration() |>
 as.numeric("minute")
## [1] 91150

CodePudding user response:

Adapting my answer here to this, I'll restate a rather gaping problem with this conversion: convert "month" into "seconds" is not constant, as months vary between 28-31 days. If we assume 30, though, for the sake of arguments, then:

func <- function(x, ptn) {
  out <- gsub(paste0(".*?\\b([0-9.] )\\s*", ptn, ".*"), "\\1", x, ignore.case = TRUE)
  ifelse(out == x, NA, out)
}
res1 <- lapply(c(mon = "month", day = "day", hr = "hour", min = "minute"),
               function(ptn) as.numeric(func(my_interval, ptn)))
res2 <- lapply(res1, function(z) ifelse(is.na(z), 0, z))
res2
# $mon
# [1] 2
# $day
# [1] 3
# $hr
# [1] 6
# $min
# [1] 70
86400 * (res2$mon*31   res2$day)   3600*res2$mon   60*res2$hr
# [1] 5623560

Because I'm using lapply and simple vectorizable operations here, this also works if my_interval is more than one string (of similar format). It is robust to missing variables (presumed 0), and can include "year" (albeit with leap-year inaccuracies) and/or "second" if desired.

intervals <- c("2 month 3 day 6 hour 70 minute", "1 year", "1 hour 1 second")
res1 <- lapply(c(yr = "year", mon = "month", day = "day", hr = "hour", min = "minute", sec = "second"),
               function(ptn) as.numeric(func(intervals, ptn)))
res2 <- lapply(res1, function(z) ifelse(is.na(z), 0, z))
str(res2)
# List of 6
#  $ yr : num [1:3] 0 1 0
#  $ mon: num [1:3] 2 0 0
#  $ day: num [1:3] 3 0 0
#  $ hr : num [1:3] 6 0 1
#  $ min: num [1:3] 70 0 0
#  $ sec: num [1:3] 0 0 1
86400 * (res2$yr*365   res2$mon*31   res2$day)   3600*res2$mon   60*res2$hr   res2$sec
# [1] 5.6e 06 3.2e 07 6.1e 01

CodePudding user response:

My workaround is to use my PostgreSQL connection to do it:

library(sf)
library(RPostgres)

my_postgresql_connection <- dbConnect(Postgres(), dbname = "my_db", host = "my_host", port = 1234, user = "my_user", password = "my_password")
my_interval = '2 month 3 day 6 hour 70 minute'
my_dataframe <- st_read(my_postgresql_connection, query = paste("SELECT EXTRACT(EPOCH FROM '",my_interval,"'::INTERVAL)/60 as minutes"))
my_interval_in_minutes <- as.double(my_dataframe$minutes[1])
  •  Tags:  
  • Related