Home > Blockchain >  SQL Server Grouping rows based on time span between rows
SQL Server Grouping rows based on time span between rows

Time:02-02

Lets say we have this table:

Date Par1 Par2
2020-01-31 00:00:10 1 null
2020-01-31 00:00:15 2 null
2020-01-31 00:00:16 null 4
2020-01-31 00:00:30 3 null

The goal is to obtain the values of Par1 and Par2 on the same row (using SQL) if the time between the 2 rows is less than 2 seconds. In this case row 2 and 3 are only 1 second apart, so they should be displayed on the same row. So the desired result is this:

Date Par1 Par2
2020-01-31 00:00:10 1 null
2020-01-31 00:00:15 2 4
2020-01-31 00:00:30 3 null

Par1 and Par2 are filled by separate data flows, so when Par1 is NOT null, Par2 is always going to be null.

CodePudding user response:

I've grouped data by proximity in that past, but had to step through several intermediate steps to get the grouping I wanted. The steps used are as follows:

  1. Sort and assign unambiguous row numbers to the data.
  2. Tag rows that are the start or end of a group based on distance from preceding or following record.
  3. For each group starting row, find the matching ending row and generate a result that aggregates data within that range.

Below is an adaptation that might suit your purposes:

DECLARE @Data TABLE ([Date] DATETIME, Par1 INT, Par2 INT)
INSERT @Data
VALUES 
    ('2020-01-31 00:00:10', 1, null),
    ('2020-01-31 00:00:15', 2, null),
    ('2020-01-31 00:00:16', null, 4),
    ('2020-01-31 00:00:30', 3, null)

;
WITH NUMBERED_DATA AS (
    SELECT RowNo = ROW_NUMBER() OVER(ORDER BY [Date]), D.*
    FROM @Data D
),
TAGGED_DATA AS (
    SELECT B.IsFirst, B.IsLast, D.*
    FROM NUMBERED_DATA D
    CROSS APPLY(
        SELECT
            PriorDate = (SELECT D1.[Date] FROM NUMBERED_DATA D1 WHERE D1.RowNo = D.RowNo - 1),
            NextDate = (SELECT D2.[Date] FROM NUMBERED_DATA D2 WHERE D2.RowNo = D.RowNo   1)
    ) A
    CROSS APPLY(
        SELECT
            IsFirst = CASE WHEN A.PriorDate IS NULL OR DATEDIFF(second, A.PriorDate, D.[date]) >= 2 THEN 1 ELSE 0 END,
            IsLast = CASE WHEN A.NextDate IS NULL OR DATEDIFF(second, D.[date], A.NextDate) >= 2 THEN 1 ELSE 0 END
    ) B
),
COMBINED_DATA AS (
    SELECT D1.Date, Par1First = D1.Par1, Par2Last = D2.Par2, Par1Min = MIN(D.Par1), Par2Max = MAX(D.Par2)
    FROM TAGGED_DATA D1
    CROSS APPLY (
        SELECT TOP 1 D2.*
        FROM TAGGED_DATA D2
        WHERE D2.RowNo >= D1.RowNo AND D2.IsLast = 1
        ORDER BY D2.RowNo ASC
    ) D2
    JOIN TAGGED_DATA D ON D.RowNo BETWEEN D1.RowNo AND D2.RowNo
    WHERE D1.IsFirst = 1
    GROUP BY D1.Date, D1.Par1, D2.Par2

)
SELECT *
FROM COMBINED_DATA
ORDER BY Date

The above uses a combination of Common table Expressions (CTEs) and CROSS APPLY constructs to progressively calculate intermediate results. To see the intermediate results, change COMBINED_DATA to TAGGED_DATA in the final select.

I wasn't sure if you wanted the First/last or the Min/Max Par1/Par2 values, so the above calculates both. If only fist/last is needed, the JOIN ... BETWEEN ... and GROUP BY ... lines can be eliminated.

One limitation of the above is that it is inefficient for big data since it involves table scans or hash joins for the intermediate results. Ideally, RowNo should be indexed, but I do not believe this is possible for a CTE. (See Adding an INDEX to a CTE) To increase efficiency, the above logic would need to be adapted to use a temporary table.

  •  Tags:  
  • Related