I have a table with data like below:
Log Table:
| User Id | Login Date |
|---|---|
| 1 | 2022-01-03 |
| 1 | 2022-01-04 |
| 1 | 2022-01-10 |
| 1 | 2022-01-11 |
| 1 | 2022-01-12 |
| 1 | 2022-01-23 |
| 1 | 2022-01-25 |
| 1 | 2022-01-26 |
| 1 | 2022-01-27 |
| 1 | 2022-01-28 |
What I'm trying to do is to create a query that return rows of the latest logins by consecutive dates with var_date as parameter.
If var_date is 2022-01-29, then the result is:
| User Id | Login Date |
|---|---|
| 1 | 2022-01-25 |
| 1 | 2022-01-26 |
| 1 | 2022-01-27 |
| 1 | 2022-01-28 |
If var_date is 2022-01-30, then no result is returned, since 2022-01-29 is not in the table.
If var_date is 2022-01-24, then the query will return row with 2022-01-23 as login date.
How am I to do this in SQLite?
Thank you.
CodePudding user response:
This question is a variant of gaps and islands, with the islands being clusters of records per user with continuous dates. Here is one approach using analytic functions:
WITH cte AS (
SELECT *, CASE WHEN julianday(LoginDate) -
julianday(LAG(LoginDate) OVER (PARTITION BY UserID
ORDER BY LoginDate))
> 1 THEN 1 ELSE 0 END AS counter
FROM yourTable
),
cte2 AS (
SELECT *, SUM(counter) OVER (PARTITION BY UserID ORDER BY LoginDate) AS grp
FROM cte
)
SELECT UserID, LoginDate
FROM cte2 t1
WHERE LoginDate < '2022-01-29' AND
grp = (SELECT t2.grp FROM cte2 t2
WHERE t2.UserID = t1.UserID AND t2.LoginDate = '2022-01-28');
Demo
The two CTEs generate a pseudo date group for each cluster per user. The final query returns all records less than the target date for which the group value is the same as the immediately preceding date. Hence, for dates having no immediate record for a given user, the query will return empty set.
CodePudding user response:
What you can do is use lag to get the difference between each date and the previous one datediff; with that you can find all the dates between your parameter date and the latest date with datediff different of 1 day:
with u as
(select UserId,
LoginDate,
julianday(LoginDate) - julianday(lag(LoginDate) over(order by LoginDate)) as datediff
from log
WHERE LoginDate < '2022-01-29')
select UserId, LoginDate from u
where LoginDate >= (SELECT max(LoginDate) from u where datediff <> 1)
