Basically, I have a table like this
| id | timestamp | state | minutes |
|---|---|---|---|
| 1 | 7/12/2021, 17:38 | off | 14.54 |
| 1 | 7/12/2021, 17:53 | running | 8.39 |
| 1 | 7/12/2021, 18:01 | off | 8.12 |
| 1 | 9/12/2021, 00:04 | running | big value here |
The table is ordered by id and then by timestamp ascending. Each id represents a machine and in the first row, for example, the machine was off for 14.54 minutes, from 7/12/2021 17:38 till 7/12/2021 17:53 (2nd row). At 17:53 the machine started running for 8.39 minutes until 7/12/2021 18:01 etc...
What I do then is sum all the minutes and group those by day and state, so I know how much time each machine spent turned off and running, for each day. The problem is that, when the consecutive timestamps are from different consecutive days, I get a wrong sum value for that day. In the example of the table above, I would get that the sum of total minutes in the off state would be 14.54 8.39 8.12 big value here.
Ideally what should happen is a group similar to this (considering only for the off state)
| id | day | state | minutes |
|---|---|---|---|
| 1 | 7/12/2021 | off | 22.66 |
| 1 | 8/12/2021 | off | 1440 |
| 1 | 9/12/2021 | off | 4 |
I was thinking maybe could artificially add timestamps on the result table like "17/12/2021 24:00" so I always get the intervals cut correctly, for each day, when grouping.
Can you please help me? It is really important!! Thanks
CodePudding user response:
Something like this:
with Cal as (
select cast(DayStarts as date) as Date, DayStarts, DayStarts interval '1 day' - interval '1 second' as DayEnds
from (
select CAST('2021-12-07' AS timestamp(0)) (n || ' day')::INTERVAL as DayStarts
from generate_series(0, 10) n
) Cal1
),
MyTbl as (
select
Tbl.id
, tbl.timestamp as StartTS
, tbl.state
, timestamp (tbl.minutes * interval '1 minute') as EndTS
from Tbl
)
select
C.Date
, T.State
, sum(extract(Epoch from (least(C.DayEnds, T.EndTS)
-
greatest(C.DaySTarts, T.StartTS) interval '1 second'))/60) as minutes
from Cal C
inner join
MyTbl T
on C.DayEnds>=T.StartTS
and C.DaySTarts<=T.EndTS
group by C.Date, T.State
order by C.Date, T.State
Cal is the calendar, produced for the next 10 days starting from 2021-12-07, it returns a date column, and the first/last moments (timestamps) of the date. Tbl is your table, MyTbl is that one plus an end timestamp (EndTs) by adding the minutes to the timestamp column. We then join these tables looking for 'overlaps'; then for each overlap, we calculate the minutes of the overlap, and then aggregate by date and state.
The last second of each day is not correctly included, but if rounded to minutes it would be fine. It shouldn't be difficult to change it to use < NextDate, to make it perfectly accurate.
