The Problem: How can I combine two SQL conditions for matching/joining into one statement? I want both missing records, but also if the record I have is outdated, I want to grab the latest and greatest. I have done with two separate SQL queries and then used a UNION on the DISTINCT Id's but I would prefer to combine this into one query if possible. I'm struggling with the approach so any help is appreciated. Thank you!
Source Table
| id | modified_at |
|---|---|
| a | 2023-10-02 16:42 |
| b | 2023-10-03 16:42 |
| c | 2023-10-04 16:42 |
| d | 2023-10-05 16:42 |
| e | 2023-10-06 16:42 |
| f | 2023-10-07 16:42 |
| g | 2023-10-08 16:42 |
| h | 2023-10-09 16:42 |
| i | 2023-10-10 16:32 |
| j | 2023-10-11 16:52 |
Destination Table
| id | modified_at |
|---|---|
| a | 2023-10-02 16:42 |
| b | 2023-10-03 16:42 |
| c | 2023-10-04 16:42 |
| d | 2023-10-05 16:42 |
| h | 2023-10-09 16:42 |
| i | 2023-10-10 16:32 |
| j | 2023-09-11 16:52 |
Desired Result from SELECT
| id | modified_at |
|---|---|
| e | 2023-10-06 16:42 |
| f | 2023-10-07 16:42 |
| g | 2023-10-08 16:42 |
| j | 2023-10-11 16:52 |
Rationale: e, f, and g are missing records in destination, so we want to include them in our result set. Last but not least j is included because source has a more recent record than destination due to the modified_at timestamp, so we want to include it to get the latest and greatest. My current attempts are below:
Technical Limitation: Not able to use transactions, if statements, loops, inserts, or updates. I'm strictly limited to select statements, joins, crosses, etc..
SQL: All records that changed
SELECT
id,
modified_at
FROM source
WHERE source.modified_at >= (SELECT MAX(modified_at) AS last_modified_at FROM destination)
SQL: All missing records
SELECT
source.id,
source.modified_at
FROM source
LEFT JOIN destination
ON source.Id = destination.id
WHERE destination.id IS NULL
CodePudding user response:
When asking questions like this it's helpful to provide the DDL/DML:
DECLARE @SourceTable TABLE (ID VARCHAR(1), ModifiedDateTime DATETIME);
INSERT INTO @SourceTable (ID, ModifiedDateTime) VALUES
('a', '2023-10-02 16:42'), ('b', '2023-10-03 16:42'), ('c', '2023-10-04 16:42'), ('d', '2023-10-05 16:42'),
('e', '2023-10-06 16:42'), ('f', '2023-10-07 16:42'), ('g', '2023-10-08 16:42'), ('h', '2023-10-09 16:42'),
('i', '2023-10-10 16:32'), ('j', '2023-10-11 16:52');
DECLARE @DestinationTable TABLE (ID VARCHAR(1), ModifiedDateTime DATETIME);
INSERT INTO @DestinationTable (ID, ModifiedDateTime) VALUES
('a', '2023-10-02 16:42'), ('b', '2023-10-03 16:42'), ('c', '2023-10-04 16:42'), ('d', '2023-10-05 16:42'),
('h', '2023-10-09 16:42'),
('i', '2023-10-10 16:32'), ('j', '2023-09-11 16:52');
As @Dai suggests, you can either use an OR or a UNION ALL here:
/* OR */
SELECT *
FROM @SourceTable s
LEFT OUTER JOIN @DestinationTable d
ON s.ID = d.ID
WHERE d.ID IS NULL
OR s.ModifiedDateTime > d.ModifiedDateTime;
/* UNION ALL */
SELECT *
FROM @SourceTable s
LEFT OUTER JOIN @DestinationTable d
ON s.ID = d.ID
WHERE d.ID IS NULL
UNION ALL
SELECT *
FROM @SourceTable s
LEFT OUTER JOIN @DestinationTable d
ON s.ID = d.ID
WHERE s.ModifiedDateTime > d.ModifiedDateTime;
| ID | ModifiedDateTime | ID | ModifiedDateTime |
|---|---|---|---|
| e | 2023-10-06 16:42:00.000 | ||
| f | 2023-10-07 16:42:00.000 | ||
| g | 2023-10-08 16:42:00.000 | ||
| j | 2023-10-11 16:52:00.000 | j | 2023-09-11 16:52:00.000 |
CodePudding user response:
Isn't this a simple OR:
SELECT
source.id,
source.modified_at
FROM source
LEFT JOIN destination
ON source.Id = destination.id
WHERE (
destination.id IS NULL
OR source.modified_at >= destination.modified_at
)
Your code does some sort of MAX so they're not completely equavalent, but i think the above is the canonical version of missing / new changes
