I have 2 Tables (History and Responsible). They need to be JOINED based on Service Date.
History Table:
| Id | ServiceDate | Hours | ClientId | ClientName |
|---|---|---|---|---|
| 1 | 2021-10-15 | 3 | 123 | Tom Holland |
| 2 | 2021-10-25 | 5 | 123 | Tom Holland |
| 3 | 2022-01-14 | 2 | 123 | Tom Holland |
Responsible Table:
2999-12-31 means Responsible has no end date (current)
| ClientId | ClientName | ResponsibleId | ResponsibleName | ResponsibleStartDate | ResponsibleEndtDate |
|---|---|---|---|---|---|
| 123 | Tom Holland | 77 | Thomas Anderson | 2020-09-17 | 2021-10-17 |
| 123 | Tom Holland | 88 | Tom Cruise | 2021-10-18 | 2999-12-31 |
| 123 | Tom Holland | 99 | Sten Lee | 2022-01-07 | 2999-12-31 |
My code produces multiple rows, because 2022-01-14 Service date falls under multiple date ranges from Responsible Table:
SELECT h.Id,
h.ServiceDate,
h.Hours,
h.ClientId,
h.ClientName,
r.ResponsibleName
FROM History AS h
LEFT JOIN Responsible AS r
ON (h.ClientId = r.ClientId AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate)
The output of the query above is:
| Id | ServiceDate | Hours | ClientId | ClientName | ResponsibleName |
|---|---|---|---|---|---|
| 1 | 2021-10-15 | 3 | 123 | Tom Holland | Thomas Anderson |
| 2 | 2021-10-25 | 5 | 123 | Tom Holland | Tom Cruise |
| 3 | 2022-01-14 | 2 | 123 | Tom Holland | Tom Cruise |
| 3 | 2022-01-14 | 2 | 123 | Tom Holland | Sten Lee |
Technically, output is correct (because 2022-01-14 is between 2021-10-18 - 2999-12-31 as well between 2022-01-07 - 2999-12-31), but not what I need.
I would like to know if possible to achieve 2 outputs:
1) If Service Date falls in multiple date ranges from Responsible Table, Responsible Should be the person who's ResponsibleStartDate is closer to the ServiceDate:
| Id | ServiceDate | Hours | ClientId | ClientName | ResponsibleName |
|---|---|---|---|---|---|
| 1 | 2021-10-15 | 3 | 123 | Tom Holland | Thomas Anderson |
| 2 | 2021-10-25 | 5 | 123 | Tom Holland | Tom Cruise |
| 3 | 2022-01-14 | 2 | 123 | Tom Holland | Sten Lee |
2) Keep all rows, if Service Date falls in multiple date ranges from Responsible Table, but split Hours evenly between Responsible:
| Id | ServiceDate | Hours | ClientId | ClientName | ResponsibleName |
|---|---|---|---|---|---|
| 1 | 2021-10-15 | 3 | 123 | Tom Holland | Thomas Anderson |
| 2 | 2021-10-25 | 5 | 123 | Tom Holland | Tom Cruise |
| 3 | 2022-01-14 | 1 | 123 | Tom Holland | Tom Cruise |
| 3 | 2022-01-14 | 1 | 123 | Tom Holland | Sten Lee |
CodePudding user response:
First one, we can use a window function to apply a row number, based on how close to ServiceDate the ResponsibleStartDate is, then we can just pick the first row per h.Id. If there is a tie we can break it by picking something that will give us deterministic order, e.g. ORDER BY {DATEDIFF expression}, ResponsibleName.
;WITH cte AS
(
SELECT h.Id,
h.ServiceDate,
h.Hours,
h.ClientId,
h.ClientName,
r.ResponsibleName,
RankOrderedByProximityToServiceDate = ROW_NUMBER() OVER
(PARTITION BY h.Id
ORDER BY ABS(DATEDIFF(DAY, ResponsibleStartDate, ServiceDate)))
FROM dbo.History AS h
LEFT JOIN dbo.Responsible AS r
ON (h.ClientId = r.ClientId
AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate)
)
SELECT Id, ServiceDate, Hours, ClientId, ClientName, ResponsibleName
FROM cte WHERE RankOrderedByProximityToServiceDate = 1;
Output:
| Id | ServiceDate | Hours | ClientId | ClientName | ResponsibleName |
|---|---|---|---|---|---|
| 1 | 2021-10-15 | 3 | 123 | Tom Holland | Thomas Anderson |
| 2 | 2021-10-25 | 5 | 123 | Tom Holland | Tom Cruise |
| 3 | 2022-01-14 | 2 | 123 | Tom Holland | Sten Lee |
Second one doesn't require a CTE, we can simply divide the Hours in h by the number of rows that exist for that h.Id, then limit it to 2 decimal places:
SELECT h.Id,
h.ServiceDate,
Hours = CONVERT(decimal(11,2),
h.Hours * 1.0
/ COUNT(h.Id) OVER (PARTITION BY h.Id)),
h.ClientId,
h.ClientName,
r.ResponsibleName
FROM dbo.History AS h
LEFT JOIN dbo.Responsible AS r
ON (h.ClientId = r.ClientId
AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate);
Output:
| Id | ServiceDate | Hours | ClientId | ClientName | ResponsibleName |
|---|---|---|---|---|---|
| 1 | 2021-10-15 | 3.00 | 123 | Tom Holland | Thomas Anderson |
| 2 | 2021-10-25 | 5.00 | 123 | Tom Holland | Tom Cruise |
| 3 | 2022-01-14 | 1.00 | 123 | Tom Holland | Tom Cruise |
| 3 | 2022-01-14 | 1.00 | 123 | Tom Holland | Sten Lee |
Both demonstrated in this db<>fiddle.
CodePudding user response:
My attempt at part 1 - it doesn't work if there's more than one Responsible as of the same start date.
WITH
"all_services" AS (
SELECT
h.Id,
h.ServiceDate,
h.Hours,
h.ClientId,
h.ClientName,
r.ResponsibleName,
r.ResponsibleStartDate
FROM History AS h
LEFT JOIN Responsible AS r
ON h.ClientId = r.ClientId
AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate
),
"most_recent_key" AS (
SELECT
ServiceDate,
ClientId,
MAX(ResponsibleStartDate) AS "ResponsibleStartDate"
FROM all_services
GROUP BY ServiceDate, ClientId
)
SELECT Id, ServiceDate, Hours, ClientId, ClientName, ResponsibleName
FROM all_services
INNER JOIN most_recent_key
USING (ServiceDate, ClientId, ResponsibleStartDate)
Posting it anyway as a contrast to Aaron's better solution as a learning point for myself.
