Home > OS >  Determining how far away is the next ID
Determining how far away is the next ID

Time:01-18

So I have some data, subset as follows:

ID  data start_time 
001    X 2021-12-29 10:54:12.429  0000
002    Y 2022-01-16 05:07:55.708  0000 
003    Y 2021-12-31 12:25:12.980  0000
002    A 2022-01-03 12:49:41.866  0000
001    A 2021-12-30 16:32:13.736  0000
001    A 2022-01-17 10:10:10.736  0000

I would like to determine in minutes, the time difference between a given ID and the next occurrence in the dataframe, in order of start_time. So if an ID appears at 12:00 and 12:01, I would like the ID to show the time of the next entry as well as the diff in minutes, using SQL/Snowflake. CTE preferred.

The following fields should be added:

  • next_timestamp: The timestamp of the following entry
  • time_diff: The difference in minutes between start_time and next_timestamp.
  • entry_order: The number of how many of this ID has been.

Expected output:

ID  data start_time                       next_timestamp                 time_diff  entry_order
001    X 2021-12-29 10:54:12.429  0000    2021-12-30 16:32:13.736  0000  1778       1
001    A 2021-12-30 16:32:13.736  0000    2022-01-17 10:10:10.736  0000  25537      2
003    Y 2021-12-31 12:25:12.980  0000    NULL                           NULL       1
002    A 2022-01-03 12:49:41.866  0000    2022-01-16 05:07:55.708  0000  18258      1
002    Y 2022-01-16 05:07:55.708  0000    NULL                           NULL       2
001    A 2022-01-17 10:10:10.736  0000    NULL                           NULL       3

Notice, the resulting output is ordered by the timestamp, ascending.

CodePudding user response:

The LEAD function can be used to find the next start_time per ID.

And the ROW_NUMBER function can return a unique sequencial number per ID.

SELECT *
, LEAD(start_time) OVER (PARTITION BY ID ORDER BY start_time) AS next_timestamp
, DATEDIFF(minute, start_time, LEAD(start_time) OVER (PARTITION BY ID ORDER BY start_time)) AS time_diff
, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY start_time) AS entry_order
FROM your_table
ORDER BY start_time

CodePudding user response:

Using LEAD, DATEDIFF and ROW_NUMBER:

SELECT *,
   LEAD(start_time) OVER(PARTITITON BY ID ORDER BY start_time) AS next_timestamp,
   DATEDIFF(seconds, start_time, next_timestamp) SA time_difference,
   ROW_NUMBER() OVER(PARTITITON BY ID ORDER BY start_time) AS entry_order
FROM tab
  •  Tags:  
  • Related