Home > Enterprise >  Snowflake Query to find what date is missing
Snowflake Query to find what date is missing

Time:02-02

I have a database which has following columns

id int
start datetime
end datetime
<other columns

The start and END is datetime of following format YYYY-MM-DD H24:MI:SS.MI. In a perfect world the END of yesterday will become start of today. However, there are times when the job doesn't run and it skips making an entry into the system. I want to write a query which can tell me which date is missing from the table (time doesn't matter).

For example, consider this

START   END
2022-01-31 H24:MI:SS.MI 2022-02-01 H24:MI:SS.MI
2022-01-29 H24:MI:SS.MI 2022-01-30 H24:MI:SS.MI

I want to write a query which should be able to tell that start of 2022-01-30 is missing from the table.

EDIT Bare minimum, the expected output should be 2022-01-30 (i.e., start is missing this date)

CodePudding user response:

I think you are looking for end_dates that don't have a matching start_date — except the last end_date:

with data as (
select $1 start_date, $2 end_date
from values('2022-01-31', '2022-02-01'), ('2022-01-29', '2022-01-30')   
)

select end_date 
from data
where end_date not in (select start_date from data)
and end_date not in (select max(end_date) from data)

CodePudding user response:

if you have some date with a gap like (I have used number instead of dates, just because they are small to type, and read, but the idea works across the types):

SELECT * FROM VALUES
    (1,1,2),
    (1,2,3),
    (1,4,5),
    (1,5,6)
    v(job_id, start_id, end_id)

then you can get the (2,3) row AND/OR the (4,5) row by using a LAG/LEAD function where the prior end is not the current start, OR the next start is not the current end.

like:

SELECT v.*,
    LAG(end_id, 1, start_id) OVER (PARTITION BY job_id order by start_id) as prior_end,
    LEAD(start_id, 1, end_id) OVER (PARTITION BY job_id order by start_id) as next_start
FROM VALUES
    (1,1,2),
    (1,2,3),
    (1,4,5),
    (1,5,6)
    v(job_id, start_id, end_id)
QUALIFY (start_id != prior_end) OR (end_id != next_start)
ORDER BY 1,2

gives:

JOB_ID START_ID END_ID PRIOR_END NEXT_START
1 2 3 2 4
1 4 5 3 5

and if you want it "cleaner"

SELECT v.*
FROM VALUES
    (1,1,2),
    (1,2,3),
    (1,4,5),
    (1,5,6)
    v(job_id, start_id, end_id)
QUALIFY (
    start_id != LAG(end_id, 1, start_id) OVER (PARTITION BY job_id order by start_id)
   ) OR (
    end_id != LEAD(start_id, 1, end_id) OVER (PARTITION BY job_id order by start_id)
   )
ORDER BY 1,2
JOB_ID START_ID END_ID
1 2 3
1 4 5
  •  Tags:  
  • Related