my data looks something like this:
| CASE_TIMESTAMP | GROUP | |
|---|---|---|
| 0 | 2017-12-26 16:12:09 00:00 | A |
| 1 | 2017-12-26 16:12:44 00:00 | A |
| 2 | 2020-04-21 07:00:00 00:00 | A |
| 3 | 2020-07-01 00:05:35 00:00 | A |
| 4 | 2020-08-06 07:00:00 00:00 | A |
| 5 | 2020-08-06 07:00:00 00:00 | A |
| 6 | 2020-08-06 07:00:00 00:00 | A |
| 7 | 2020-08-25 07:00:00 00:00 | B |
| 8 | 2020-09-22 07:00:00 00:00 | B |
| 9 | 2020-09-22 07:00:00 00:00 | B |
| 10 | 2020-12-04 08:00:00 00:00 | B |
| 11 | 2020-12-04 08:00:00 00:00 | B |
| 12 | 2020-12-07 08:00:00 00:00 | B |
| 13 | 2020-12-07 08:00:00 00:00 | B |
| 14 | 2020-12-07 08:00:00 00:00 | B |
| 15 | 2020-12-08 08:00:00 00:00 | B |
| 16 | 2020-12-08 08:00:00 00:00 | B |
| 17 | 2020-12-08 08:00:00 00:00 | B |
Need to drop cases that occurred before a gap of more than one day, so in group a all cases before 2020-08-06 and in B all cases before 2020-12-07.
Think I need a window function, but don't know how to calculate gaps and then drop all before, any ideas?
PS.I'm on snowflake
CodePudding user response:
Using QUALIFY and windowed MAX to find the latest CASE_TIMESTAMP per GRR:
CREATE TABLE t(CASE_TIMESTAMP TIMESTAMP, GRP VARCHAR)
AS
SELECT '2017-12-26 16:12:09 00:00','A'
UNION ALL SELECT '2017-12-26 16:12:44 00:00','A'
UNION ALL SELECT '2020-04-21 07:00:00 00:00','A'
UNION ALL SELECT '2020-07-01 00:05:35 00:00','A'
UNION ALL SELECT '2020-08-06 07:00:00 00:00','A'
UNION ALL SELECT '2020-08-06 07:00:00 00:00','A'
UNION ALL SELECT '2020-08-06 07:00:00 00:00','A'
UNION ALL SELECT '2020-08-25 07:00:00 00:00','B'
UNION ALL SELECT '2020-09-22 07:00:00 00:00','B'
UNION ALL SELECT '2020-09-22 07:00:00 00:00','B'
UNION ALL SELECT '2020-12-04 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-04 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-07 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-07 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-07 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-08 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-08 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-08 08:00:00 00:00','B';
Query:
SELECT *
FROM t
QUALIFY CASE_TIMESTAMP >= MAX(CASE_TIMESTAMP) OVER(PARTITION BY GRP)
- INTERVAL '1 days';
Output:

