Home > Blockchain >  JOIN Tables based on Service Date
JOIN Tables based on Service Date

Time:02-04

I have 2 Tables (History and Responsible). They need to be JOINED based on Service Date.

History Table:

Id ServiceDate Hours ClientId ClientName
1 2021-10-15 3 123 Tom Holland
2 2021-10-25 5 123 Tom Holland
3 2022-01-14 2 123 Tom Holland

Responsible Table:

2999-12-31 means Responsible has no end date (current)

ClientId ClientName ResponsibleId ResponsibleName ResponsibleStartDate ResponsibleEndtDate
123 Tom Holland 77 Thomas Anderson 2020-09-17 2021-10-17
123 Tom Holland 88 Tom Cruise 2021-10-18 2999-12-31
123 Tom Holland 99 Sten Lee 2022-01-07 2999-12-31

My code produces multiple rows, because 2022-01-14 Service date falls under multiple date ranges from Responsible Table:

SELECT h.Id, 
       h.ServiceDate, 
       h.Hours, 
       h.ClientId, 
       h.ClientName, 
       r.ResponsibleName
FROM History AS h
LEFT JOIN Responsible AS r
   ON (h.ClientId = r.ClientId AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate)

The output of the query above is:

Id ServiceDate Hours ClientId ClientName ResponsibleName
1 2021-10-15 3 123 Tom Holland Thomas Anderson
2 2021-10-25 5 123 Tom Holland Tom Cruise
3 2022-01-14 2 123 Tom Holland Tom Cruise
3 2022-01-14 2 123 Tom Holland Sten Lee

Technically, output is correct (because 2022-01-14 is between 2021-10-18 - 2999-12-31 as well between 2022-01-07 - 2999-12-31), but not what I need.

I would like to know if possible to achieve 2 outputs:

1) If Service Date falls in multiple date ranges from Responsible Table, Responsible Should be the person who's ResponsibleStartDate is closer to the ServiceDate:

Id ServiceDate Hours ClientId ClientName ResponsibleName
1 2021-10-15 3 123 Tom Holland Thomas Anderson
2 2021-10-25 5 123 Tom Holland Tom Cruise
3 2022-01-14 2 123 Tom Holland Sten Lee

2) Keep all rows, if Service Date falls in multiple date ranges from Responsible Table, but split Hours evenly between Responsible:

Id ServiceDate Hours ClientId ClientName ResponsibleName
1 2021-10-15 3 123 Tom Holland Thomas Anderson
2 2021-10-25 5 123 Tom Holland Tom Cruise
3 2022-01-14 1 123 Tom Holland Tom Cruise
3 2022-01-14 1 123 Tom Holland Sten Lee

CodePudding user response:

First one, we can use a window function to apply a row number, based on how close to ServiceDate the ResponsibleStartDate is, then we can just pick the first row per h.Id. If there is a tie we can break it by picking something that will give us deterministic order, e.g. ORDER BY {DATEDIFF expression}, ResponsibleName.

;WITH cte AS 
(
  SELECT h.Id, 
       h.ServiceDate, 
       h.Hours, 
       h.ClientId, 
       h.ClientName, 
       r.ResponsibleName, 
       RankOrderedByProximityToServiceDate = ROW_NUMBER() OVER 
         (PARTITION BY h.Id 
          ORDER BY ABS(DATEDIFF(DAY, ResponsibleStartDate, ServiceDate)))
  FROM dbo.History AS h
  LEFT JOIN dbo.Responsible AS r
     ON (h.ClientId = r.ClientId 
     AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate)
)
SELECT Id, ServiceDate, Hours, ClientId, ClientName, ResponsibleName
FROM cte WHERE RankOrderedByProximityToServiceDate = 1;

Output:

Id ServiceDate Hours ClientId ClientName ResponsibleName
1 2021-10-15 3 123 Tom Holland Thomas Anderson
2 2021-10-25 5 123 Tom Holland Tom Cruise
3 2022-01-14 2 123 Tom Holland Sten Lee

Second one doesn't require a CTE, we can simply divide the Hours in h by the number of rows that exist for that h.Id, then limit it to 2 decimal places:

SELECT h.Id, 
       h.ServiceDate,
       Hours = CONVERT(decimal(11,2), 
         h.Hours * 1.0
         / COUNT(h.Id) OVER (PARTITION BY h.Id)),
       h.ClientId, 
       h.ClientName, 
       r.ResponsibleName
FROM dbo.History AS h
LEFT JOIN dbo.Responsible AS r
   ON (h.ClientId = r.ClientId 
   AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate);

Output:

Id ServiceDate Hours ClientId ClientName ResponsibleName
1 2021-10-15 3.00 123 Tom Holland Thomas Anderson
2 2021-10-25 5.00 123 Tom Holland Tom Cruise
3 2022-01-14 1.00 123 Tom Holland Tom Cruise
3 2022-01-14 1.00 123 Tom Holland Sten Lee

Both demonstrated in this db<>fiddle.

CodePudding user response:

My attempt at part 1 - it doesn't work if there's more than one Responsible as of the same start date.

WITH
"all_services" AS (
    SELECT
        h.Id, 
        h.ServiceDate, 
        h.Hours, 
        h.ClientId, 
        h.ClientName, 
        r.ResponsibleName,
        r.ResponsibleStartDate
    FROM History AS h
    LEFT JOIN Responsible AS r
           ON h.ClientId = r.ClientId
          AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate
),
"most_recent_key" AS (
    SELECT
        ServiceDate,
        ClientId,
        MAX(ResponsibleStartDate) AS "ResponsibleStartDate"
    FROM all_services
    GROUP BY ServiceDate, ClientId
)
SELECT Id, ServiceDate, Hours, ClientId, ClientName, ResponsibleName
FROM all_services
INNER JOIN most_recent_key
     USING (ServiceDate, ClientId, ResponsibleStartDate)

Posting it anyway as a contrast to Aaron's better solution as a learning point for myself.

  •  Tags:  
  • Related