I have a table that stores people's sign-ins and sign-outs in rooms. One of them logs in and tells the system how many people they are in total. As long as they don't log out, the value in the LogOut column remains NULL. This is a query for a report I have to do but I am stuck.
I want to find the maximum concurrent number of people per day in each room in a given time span. However, I also want to count the number of people that are NULL, since they are actually still logged in.
I created this query based on several answers on Stackoverflow. Unfortunately I haven't figured out how to get the number of people with NULL values on a day when no one logs into that room. The reason might be the cumulative total, but how do I work around this issue?
Here is an example table:
CREATE TABLE Registrations(
[RoomName] VARCHAR(32),
[LogIn] DATETIME,
[LogOut] DATETIME,
[NumberOfPeople] INT
)
INSERT INTO Registrations (RoomName, LogIn, LogOut, NumberOfPeople)
VALUES
('MainHall', '2022-01-31 08:00:00.000', NULL, 5),
('MainHall', '2022-01-31 08:00:00.000', '2022-01-31 10:00:00.000', 5),
('MainHall', '2022-01-31 09:00:00.000', '2022-01-31 11:00:00.000', 2),
('MainHall', '2022-01-31 12:00:00.000', '2022-01-31 14:00:00.000', 2),
('SmallRoom', '2022-01-31 08:00:00.000', NULL, 3),
('SmallRoom', '2022-01-31 10:00:00.000', '2022-01-31 12:00:00.000', 3),
('SmallRoom', '2022-01-31 11:00:00.000', '2022-01-31 14:00:00.000', 3),
('MainHall', '2022-02-01 08:00:00.000', NULL, 5),
('MainHall', '2022-02-01 08:00:00.000', '2022-01-31 10:00:00.000', 5),
('MainHall', '2022-02-01 09:00:00.000', '2022-01-31 11:00:00.000', 2),
('MainHall', '2022-02-01 12:00:00.000', '2022-01-31 14:00:00.000', 2)
This is how the first entries of MainHall on 2022-01-31 should look like:
-- A |-------------------
-- B |-----|
-- C |-----|
-- D |-----|
This is the result I want to get:
Day | Room | MaxNumberOfConcurrentLoggedInPeople
------------ ------------- -------------------------------------
2022-01-31 | MainHall | 12
2022-01-31 | SmallRoom | 9
2022-02-01 | MainHall | 17
2022-02-01 | SmallRoom | 3
This is my query:
SELECT ct.RoomName,
ct.DatePeak,
ct.RegistrationPeak
FROM
(
SELECT cte.RoomName,
CONVERT(DATE, cte.DatePeak) AS DatePeak,
MAX(cte.RegistrationPeak) AS RegistrationPeak
FROM
(
SELECT MarkedDateTime.RoomName,
SUM(SUM(MarkedDateTime.NumberOfPeople)) OVER (PARTITION BY MarkedDateTime.RoomName ORDER BY MarkedDateTime.DatePeak) AS RegistrationPeak,
CONVERT(DATE, MarkedDateTime.DatePeak) AS DatePeak
FROM
(
SELECT RoomName,
NumberOfPeople,
LogIn AS DatePeak
FROM Registrations
WHERE (LogOut >= '2022-01-31 00:00:00.000' OR LogOut IS NULL)
AND LogIn < DATEADD(DAY, 1, '2022-02-01 00:00:00.000')
UNION ALL
SELECT RoomName,
-NumberOfPeople,
COALESCE(LogOut, DATEADD(MILLISECOND, - 3, DATEADD(DAY, 1, '2022-02-01 00:00:00.000'))) AS DatePeak
FROM Registrations
WHERE (LogOut >= '2022-01-31 00:00:00.000' OR LogOut IS NULL)
AND LogIn < DATEADD(DAY, 1, '2022-02-01 00:00:00.000')
) AS MarkedDateTime
GROUP BY RoomName, DatePeak
) AS cte
GROUP BY DatePeak, RoomName
) AS ct
WHERE ct.DatePeak <= '2022-02-01 00:00:00.000'
AND ct.DatePeak >= '2022-01-31 00:00:00.000'
EDIT: I added the solution and a small visual help.
CodePudding user response:
There is a lot of code to follow. I have commented it up to explain what it is doing, but in summary: I am calculating each second that room is in use. This means that the accuracy of this report is down to a second (no less). PS I believe that there is a problem with your supplied data, so I have amended it.
INSERT INTO Registrations (RoomName, LogIn, LogOut, NumberOfPeople)
VALUES ('MainHall', '20220131 08:00:00.000', NULL, 5),
('MainHall', '20220131 08:00:00.000', '20220131 10:00:00.000', 5),
('MainHall', '20220131 09:00:00.000', '20220131 11:00:00.000', 2),
('MainHall', '20220131 12:00:00.000', '20220131 14:00:00.000', 2),
('SmallRoom', '20220131 08:00:00.000', NULL, 3),
('SmallRoom', '20220131 10:00:00.000', '20220131 12:00:00.000', 3),
('SmallRoom', '20220131 11:00:00.000', '20220131 14:00:00.000', 3),
('MainHall', '20220201 08:00:00.000', NULL, 5),
('MainHall', '20220201 08:00:00.000', '20220201 10:00:00.000', 5),
('MainHall', '20220201 09:00:00.000', '20220201 11:00:00.000', 2),
('MainHall', '20220201 12:00:00.000', '20220201 14:00:00.000', 2)
-- tally table so we can calculate all the dates we need
SELECT TOP 86400 -- seconds in a day
IDENTITY(INT,1,1) AS N
INTO #Tally
FROM master.sys.all_columns ac1
CROSS JOIN master.sys.all_columns ac2
;
-- need to figure out how how many days we want to process
DECLARE @StartDate DATETIME
DECLARE @EndDate DATETIME
DECLARE @NumberOfDays INT
SELECT @StartDate = min(dateadd(day, datediff(day,0,d),0)),
@EndDate = max(dateadd(day, datediff(day,0,d),0))
FROM ( SELECT MIN(LogIn) d
FROM Registrations
UNION
SELECT MAX(LogIn)
FROM Registrations
UNION
SELECT MIN(coalesce(LogOut, getdate()))
FROM Registrations
UNION
SELECT MAX(coalesce(LogOut, getdate()))
FROM Registrations ) x
SET @NumberOfDays = DATEDIFF( DAY, @StartDate, @EndDate ) 1
--SELECT @StartDate, @EndDate, @NumberOfDays
IF(SELECT OBJECT_ID('tempdb..#Days')) IS NOT NULL
DROP TABLE #Days
CREATE TABLE #Days(N INT, D DATETIME);
-- The cross join is slow so, if there are only a few days dont produce a cartesean product
IF((SELECT COUNT(1) FROM #Tally) >= @NumberOfDays) BEGIN
INSERT INTO #Days(N,D)
SELECT [N] = T1.N ,
[D] = DATEADD(DAY, T1.N-1, @StartDate)
FROM #Tally T1
WHERE T1.N <= @NumberOfDays
ORDER BY T1.N
END
ELSE BEGIN
INSERT INTO #Days(N,D)
SELECT [N] = ROW_NUMBER() OVER(ORDER BY T1.N, T2.N) ,
[D] = DATEADD(DAY, ROW_NUMBER() OVER(ORDER BY T1.N, T2.N)-1, @StartDate)
FROM #Tally T1
CROSS JOIN #Tally T2
END
--SELECT * FROM #Days
-- Ensure there are no missing rooms or days
; WITH base as (
SELECT DISTINCT r.RoomName, d.D
FROM Registrations r
CROSS JOIN #Days d
)
--SELECT * FROM base ORDER BY 1,2 /*
-- calculate all the times down to the second that a room can be in use
, RoomSlot as (
SELECT b.RoomName, DATEADD(SECOND, t.N -1, b.D) D
FROM base b
CROSS JOIN #Tally t
)
--SELECT * FROM RoomSlot ORDER BY 1,2 /*
-- find each second a room is in use
, data as (
SELECT rs.*, r.NumberOfPeople, r.id, r.[LogIn], r.[Logout]
FROM RoomSlot rs
INNER JOIN Registrations r ON r.RoomName = rs.RoomName
AND r.LogIn <= rs.D
AND COALESCE(r.LogOut, GETDATE()) > rs.D
)
--SELECT * FROM data ORDER BY 1,2 /*
-- calculate all the people using a room for each second it is in use
, results as (
SELECT RoomName, D, SUM(NumberOfPeople) NumberOfPeople
FROM data
GROUP BY RoomName, D
)
--SELECT * FROM results ORDER BY 1,2,3 /*
-- get the max usage
SELECT RoomName, DATEADD(DAY, DATEDIFF(DAY,0,D),0), MAX(NumberOfPeople)
FROM results
GROUP BY RoomName, DATEADD(DAY, DATEDIFF(DAY,0,D),0)
ORDER BY DATEADD(DAY, DATEDIFF(DAY,0,D),0), RoomName
-- */ -- */ -- */ -- */ -- */
/*
Actual output
Day | Room | MaxNumberOfConcurrentLoggedInPeople
------------ ------------- -------------------------------------
2022-01-31 | MainHall | 12
2022-01-31 | SmallRoom | 9
2022-02-01 | MainHall | 17
2022-02-01 | SmallRoom | 3
*/
CodePudding user response:
Your data doesn't seem to quite match your expected output, and some of the data doesn't even make sense (logouts before logins), but I think you can solve it like this:
- Unpivot the log-ins and -outs into separate events, with the quantity positive or negative respectively. Exclude nulls.
- Calculate a running sum of the quantity, partitioned by room and date, ordered by time.
- Then simply group by room and day, and take the maximum of the running sum.
SELECT
evnt.Day,
evnt.RoomName,
MaxNumberOfConcurrentLoggedInPeople = MAX(evnt.RunningSum)
FROM (
SELECT
r.RoomName,
Day = CAST(v.EventDateTime AS date),
v.DiffQuantity,
RunningSum = SUM(v.DiffQuantity) OVER (PARTITION BY r.RoomName, CAST(v.EventDateTime AS date) ORDER BY v.EventDateTime ROWS UNBOUNDED PRECEDING)
FROM Registrations r
CROSS APPLY (VALUES
(r.LogIn, r.NumberOfPeople),
(r.LogOut, -r.NumberOfPeople)
) v(EventDateTime, DiffQuantity)
WHERE v.EventDateTime IS NOT NULL
) evnt
GROUP BY
evnt.RoomName,
evnt.Day;
