Home > Software engineering >  T-SQL: Find maximum concurrent number of people per day with null values
T-SQL: Find maximum concurrent number of people per day with null values

Time:02-07

I have a table that stores people's sign-ins and sign-outs in rooms. One of them logs in and tells the system how many people they are in total. As long as they don't log out, the value in the LogOut column remains NULL. This is a query for a report I have to do but I am stuck.

I want to find the maximum concurrent number of people per day in each room in a given time span. However, I also want to count the number of people that are NULL, since they are actually still logged in.

I created this query based on several answers on Stackoverflow. Unfortunately I haven't figured out how to get the number of people with NULL values ​​on a day when no one logs into that room. The reason might be the cumulative total, but how do I work around this issue?

Here is an example table:

CREATE TABLE Registrations(
    [RoomName] VARCHAR(32),
    [LogIn] DATETIME,
    [LogOut] DATETIME,
    [NumberOfPeople] INT
)

INSERT INTO Registrations (RoomName, LogIn, LogOut, NumberOfPeople)
VALUES
('MainHall', '2022-01-31 08:00:00.000', NULL, 5),
('MainHall', '2022-01-31 08:00:00.000', '2022-01-31 10:00:00.000', 5),
('MainHall', '2022-01-31 09:00:00.000', '2022-01-31 11:00:00.000', 2),
('MainHall', '2022-01-31 12:00:00.000', '2022-01-31 14:00:00.000', 2),
('SmallRoom', '2022-01-31 08:00:00.000', NULL, 3),
('SmallRoom', '2022-01-31 10:00:00.000', '2022-01-31 12:00:00.000', 3),
('SmallRoom', '2022-01-31 11:00:00.000', '2022-01-31 14:00:00.000', 3),
('MainHall', '2022-02-01 08:00:00.000', NULL, 5),
('MainHall', '2022-02-01 08:00:00.000', '2022-01-31 10:00:00.000', 5),
('MainHall', '2022-02-01 09:00:00.000', '2022-01-31 11:00:00.000', 2),
('MainHall', '2022-02-01 12:00:00.000', '2022-01-31 14:00:00.000', 2)

This is how the first entries of MainHall on 2022-01-31 should look like:

-- A    |-------------------
-- B    |-----|
-- C       |-----|
-- D                |-----|

This is the result I want to get:

Day         |    Room     | MaxNumberOfConcurrentLoggedInPeople 
------------ ------------- -------------------------------------
2022-01-31  |  MainHall   |                 12 
2022-01-31  |  SmallRoom  |                 9
2022-02-01  |  MainHall   |                 17
2022-02-01  |  SmallRoom  |                 3

This is my query:

SELECT  ct.RoomName,
        ct.DatePeak,
        ct.RegistrationPeak
FROM
(
    SELECT  cte.RoomName,
            CONVERT(DATE, cte.DatePeak) AS DatePeak,
            MAX(cte.RegistrationPeak) AS RegistrationPeak
    FROM
        (
        SELECT  MarkedDateTime.RoomName,
                SUM(SUM(MarkedDateTime.NumberOfPeople)) OVER (PARTITION BY MarkedDateTime.RoomName ORDER BY MarkedDateTime.DatePeak) AS RegistrationPeak,
                CONVERT(DATE, MarkedDateTime.DatePeak) AS DatePeak
        FROM
            (
                SELECT      RoomName,
                            NumberOfPeople,
                            LogIn AS DatePeak

                FROM        Registrations

                WHERE       (LogOut >=  '2022-01-31 00:00:00.000' OR LogOut IS NULL)
                AND         LogIn < DATEADD(DAY, 1, '2022-02-01 00:00:00.000')

                UNION ALL

                SELECT      RoomName,
                            -NumberOfPeople,
                            COALESCE(LogOut, DATEADD(MILLISECOND, - 3, DATEADD(DAY, 1, '2022-02-01 00:00:00.000'))) AS DatePeak

                FROM        Registrations

                WHERE       (LogOut >=  '2022-01-31 00:00:00.000' OR LogOut IS NULL)
                AND         LogIn < DATEADD(DAY, 1, '2022-02-01 00:00:00.000')
            ) AS MarkedDateTime

        GROUP BY RoomName, DatePeak
        ) AS cte

    GROUP BY DatePeak, RoomName
    ) AS ct

WHERE   ct.DatePeak <= '2022-02-01 00:00:00.000'
AND     ct.DatePeak >= '2022-01-31 00:00:00.000'

EDIT: I added the solution and a small visual help.

CodePudding user response:

There is a lot of code to follow. I have commented it up to explain what it is doing, but in summary: I am calculating each second that room is in use. This means that the accuracy of this report is down to a second (no less). PS I believe that there is a problem with your supplied data, so I have amended it.

INSERT INTO Registrations (RoomName, LogIn, LogOut, NumberOfPeople)
VALUES ('MainHall', '20220131 08:00:00.000',  NULL, 5),
       ('MainHall', '20220131 08:00:00.000',  '20220131 10:00:00.000', 5),
       ('MainHall', '20220131 09:00:00.000',  '20220131 11:00:00.000', 2),
       ('MainHall', '20220131 12:00:00.000',  '20220131 14:00:00.000', 2),
       ('SmallRoom', '20220131 08:00:00.000', NULL, 3),
       ('SmallRoom', '20220131 10:00:00.000', '20220131 12:00:00.000', 3),
       ('SmallRoom', '20220131 11:00:00.000', '20220131 14:00:00.000', 3),
       ('MainHall', '20220201 08:00:00.000',  NULL, 5),
       ('MainHall', '20220201 08:00:00.000',  '20220201 10:00:00.000', 5),
       ('MainHall', '20220201 09:00:00.000',  '20220201 11:00:00.000', 2),
       ('MainHall', '20220201 12:00:00.000',  '20220201 14:00:00.000', 2)

-- tally table so we can calculate all the dates we need
SELECT  TOP 86400 -- seconds in a day
        IDENTITY(INT,1,1) AS N
INTO    #Tally
FROM    master.sys.all_columns ac1
          CROSS JOIN master.sys.all_columns ac2
;

--  need to figure out how how many days we want to process
DECLARE @StartDate DATETIME
DECLARE @EndDate DATETIME
DECLARE @NumberOfDays INT

SELECT  @StartDate = min(dateadd(day, datediff(day,0,d),0)), 
        @EndDate = max(dateadd(day, datediff(day,0,d),0))
FROM (  SELECT  MIN(LogIn) d
        FROM    Registrations
        UNION 
        SELECT  MAX(LogIn)
        FROM    Registrations
        UNION 
        SELECT  MIN(coalesce(LogOut, getdate()))
        FROM    Registrations
        UNION 
        SELECT  MAX(coalesce(LogOut, getdate()))
        FROM    Registrations ) x   

SET @NumberOfDays = DATEDIFF( DAY, @StartDate, @EndDate )   1
--SELECT @StartDate, @EndDate, @NumberOfDays

IF(SELECT OBJECT_ID('tempdb..#Days')) IS NOT NULL
    DROP TABLE #Days
CREATE TABLE #Days(N  INT, D DATETIME);

-- The cross join is slow so, if there are only a few days dont produce a cartesean product
IF((SELECT COUNT(1) FROM #Tally) >= @NumberOfDays) BEGIN
    INSERT INTO #Days(N,D)
        SELECT  [N] = T1.N ,
                [D] = DATEADD(DAY, T1.N-1, @StartDate)
        FROM      #Tally T1
        WHERE   T1.N <= @NumberOfDays
        ORDER BY T1.N
END
ELSE BEGIN
    INSERT INTO #Days(N,D)
        SELECT  [N] = ROW_NUMBER() OVER(ORDER BY T1.N, T2.N) ,
                [D] = DATEADD(DAY, ROW_NUMBER() OVER(ORDER BY T1.N, T2.N)-1, @StartDate)
        FROM      #Tally T1
                CROSS JOIN #Tally T2
END            
--SELECT * FROM #Days

-- Ensure there are no missing rooms or days
; WITH base as (
    SELECT  DISTINCT r.RoomName, d.D 
    FROM    Registrations r
            CROSS JOIN #Days d
)
--SELECT * FROM base ORDER BY 1,2 /*

-- calculate all the times down to the second that a room can be in use
, RoomSlot as (
    SELECT  b.RoomName,  DATEADD(SECOND, t.N -1, b.D) D
    FROM    base b
            CROSS JOIN #Tally t
)
--SELECT * FROM RoomSlot ORDER BY 1,2 /*

-- find each second a room is in use
, data as (
    SELECT  rs.*, r.NumberOfPeople, r.id, r.[LogIn], r.[Logout]
    FROM    RoomSlot rs 
            INNER JOIN Registrations r ON  r.RoomName = rs.RoomName
                                       AND r.LogIn <= rs.D  
                                       AND COALESCE(r.LogOut, GETDATE()) > rs.D
)
--SELECT * FROM data ORDER BY 1,2  /*

-- calculate all the people using a room for each second it is in use
, results as (
    SELECT  RoomName, D, SUM(NumberOfPeople) NumberOfPeople
    FROM    data
    GROUP BY RoomName, D
)
--SELECT * FROM results ORDER BY 1,2,3 /*

-- get the max usage
SELECT  RoomName, DATEADD(DAY, DATEDIFF(DAY,0,D),0), MAX(NumberOfPeople)
FROM    results
GROUP BY RoomName, DATEADD(DAY, DATEDIFF(DAY,0,D),0)
ORDER BY DATEADD(DAY, DATEDIFF(DAY,0,D),0), RoomName

-- */ -- */ -- */ -- */ -- */



/*
Actual output
Day         |    Room     | MaxNumberOfConcurrentLoggedInPeople 
------------ ------------- -------------------------------------
2022-01-31  |  MainHall   |                 12 
2022-01-31  |  SmallRoom  |                 9
2022-02-01  |  MainHall   |                 17
2022-02-01  |  SmallRoom  |                 3
*/

CodePudding user response:

Your data doesn't seem to quite match your expected output, and some of the data doesn't even make sense (logouts before logins), but I think you can solve it like this:

  • Unpivot the log-ins and -outs into separate events, with the quantity positive or negative respectively. Exclude nulls.
  • Calculate a running sum of the quantity, partitioned by room and date, ordered by time.
  • Then simply group by room and day, and take the maximum of the running sum.
SELECT
  evnt.Day,
  evnt.RoomName,
  MaxNumberOfConcurrentLoggedInPeople = MAX(evnt.RunningSum)
FROM (
    SELECT
      r.RoomName,
      Day = CAST(v.EventDateTime AS date),
      v.DiffQuantity,
      RunningSum = SUM(v.DiffQuantity) OVER (PARTITION BY r.RoomName, CAST(v.EventDateTime AS date) ORDER BY v.EventDateTime ROWS UNBOUNDED PRECEDING)
    FROM Registrations r
    CROSS APPLY (VALUES
        (r.LogIn, r.NumberOfPeople),
        (r.LogOut, -r.NumberOfPeople)
    ) v(EventDateTime, DiffQuantity)
    WHERE v.EventDateTime IS NOT NULL
) evnt
GROUP BY
  evnt.RoomName,
  evnt.Day;

db<>fiddle

  •  Tags:  
  • Related