Which is the best practice to link tables which have unique relation for a determined period of time?
Example 1
Tab_Civil_Status:
| ID | NAME | ID_STATUS |
|---|---|---|
| 1 | Jenny | 1 |
| 2 | Mike | 2 |
Tab_Civil_Status_Desc:
| ID_STATUS | STATUS_DESC |
|---|---|
| 1 | Single |
| 2 | Married |
Tab_Civil_Events:
| PERSON-ID | EVENT | REF_Date | new-Status |
|---|---|---|---|
| 2 | Wedding | 10/5/2017 | 2 |
A query to Tab_Civil_Status should provide the below:
Case A:
Select * FROM `Tab_Civil_Status` […] Where REF_Date = 1/1/2020
| ID | NAME | ID_STATUS |
|---|---|---|
| 1 | Jenny | 1 |
| 2 | Mike | 2 |
Case B:
Select * FROM `Tab_Civil_Status` […] Where REF_Date = 1/1/2016
| ID | NAME | ID_STATUS |
|---|---|---|
| 1 | Jenny | 1 |
| 2 | Mike | 1 |
Example 2
Tab_Orders:
| ID | Order_Code | Product_Id |
|---|---|---|
| 1 | a | 1 |
| 2 | b | 2 |
Tab_Products:
| Product_Id | Product_Name |
|---|---|
| 1 | Apple |
| 2 | Meta |
Tab_Products_Events:
| Product_Id | EVENT | REF_Date | Old_Name | New_Name |
|---|---|---|---|---|
| 2 | Change_Name | 1/12/2020 | Meta |
These are simplified sample, what I am looking for is the best practice to structure tables and queries for similar cases; the logic to apply rather than the actual code to use.
a - Is the best practice to have an "Event Table" as in the above cases tracking the changes? or there are better ways?
I know there might be sevaral ways to reach the same scope, another sample is the below
Tab_Products_Events_Sample_2:
| Product_Id | EVENT | REF_Date | Name |
|---|---|---|---|
| 2 | Given _Name | 1/12/2010 | |
| 2 | Change_Name | 1/12/2019 | Meta |
b - is there a more efficient query than the below? Also based on the above point
@Selected_date = '1/1/2019'
SELECT
o.[Order_Code],
ifnull(e.[Old_Name],p.[Product_Name]) as p_name
FROM Tab_Orders as o
LEFT JOIN Tab_Products as p
ON o.[Product_Id] = p.[Product_Id]
CROSS APPLY (
SELECT top 1 t.[Old_Name]
FROM Tab_Products_Events as t
WHERE o.[Product_Id] = t.[Product_Id]
and t.[REF_Date] >= @Selected_date
ORDER BY t.[REF_Date] desc
) e
result
| Order_Code | p_name |
|---|---|
| a | Apple |
| b |
Thanks
CodePudding user response:
Note: My answer is based on the first example
1st approach - Generating date ranges
You should first extract the date ranges for each person status using a similar query:
SELECT PERSON_ID, new_Status, REF_Date,
ISNULL(LEAD(REF_Date) OVER(PARTITION BY PERSON_ID ORDER BY REF_DATE), GETDATE()) End_DATE
FROM Tab_Civil_Events
As an example, consider the following sample data:
CREATE TABLE Tab_Civil_Events(PERSON_ID INT, EVENT VARCHAR(50), REF_Date DATETIME, new_Status INT)
INSERT INTO Tab_Civil_Events(PERSON_ID , EVENT, REF_Date, new_Status)
VALUES (2, 'Wedding', '10/5/2017', 2)
, (2, 'Divorce', '10/5/2018', 1)
, (2, 'Wedding', '10/5/2019', 2)
, (2, 'Divorce', '10/5/2020', 1)
SELECT PERSON_ID, new_Status, REF_Date, ISNULL(LEAD(REF_Date) OVER(PARTITION BY PERSON_ID ORDER BY REF_DATE), GETDATE()) End_DATE
FROM Tab_Civil_Events
This will result in the following table:
| PERSON_ID | new_Status | REF_Date | End_DATE |
|---|---|---|---|
| 2 | 2 | 2017-10-05T00:00:00Z | 2018-10-05T00:00:00Z |
| 2 | 1 | 2018-10-05T00:00:00Z | 2019-10-05T00:00:00Z |
| 2 | 2 | 2019-10-05T00:00:00Z | 2020-10-05T00:00:00Z |
| 2 | 1 | 2020-10-05T00:00:00Z | 2022-01-29T09:47:52.7Z |
The second step is to create a common table expression to query the data. For example:
DECLARE @QueyrDate DATETIME = '20190501'
;WITH CTE_1 As (SELECT PERSON_ID,
new_Status,
REF_Date,
ISNULL(LEAD(REF_Date) OVER(PARTITION BY PERSON_ID ORDER BY REF_DATE),
GETDATE()) End_DATE
FROM Tab_Civil_Events)
SELECT CTE_1.PERSON_ID,
dsc.STATUS
FROM CTE_1
INNER JOIN Tab_Civil_Status_Desc dsc
ON CTE_1.new_Status = dsc.ID
WHERE @QueyrDate BETWEEN REF_Date AND End_DATE
Using the sample data I provided previously, this query will give the following result:
| PERSON_ID | STATUS |
|---|---|
| 2 | Single |
While changing the date value to 20191101
SET @QueyrDate = '20191101'
Will result in the following values:
| PERSON_ID | STATUS |
|---|---|
| 2 | Married |
Make sure that you have created the appropriate indexes to support your queries. For example:
CREATE NONCLUSTERED INDEX IX_PersonId_RefDate
ON Tab_Civil_Events(PERSON_ID, REF_DATE)
INCLUDE(new_status);
2nd approach - Retrieving the nearest event date
You should first retrieve the maximum event dates for each person having a value smallest than the given date:
SELECT PERSON_ID,MAX(Ref_Date) as event_date FROM Tab_Civil_Events
WHERE Ref_Date <= @QueyrDate
GROUP BY PERSON_ID
Next, you should join this query with the events table to get the status information as follows:
DECLARE @QueyrDate DATETIME = '20190501'
SELECT tbl.PERSON_ID, dsc.STATUS
FROM Tab_Civil_Events tbl INNER JOIN
Tab_Civil_Status_Desc dsc ON tbl.new_Status = dsc.ID INNER JOIN
(SELECT PERSON_ID,MAX(Ref_Date) as event_date FROM Tab_Civil_Events
WHERE Ref_Date <= @QueyrDate
GROUP BY PERSON_ID) t1 ON tbl.PERSON_ID = t1.PERSON_ID and t1.event_date = tbl.ref_date
3rd approach - Temporal tables
I am posting this part after you mentioned that you can change the way the data is stored
Using SQL Server 2016 or a newer version, you can use temporal tables to simplify querying your data.
As mentioned in the official documentation:
SQL Server 2016 introduced support for temporal tables (also known as system-versioned temporal tables) as a database feature that brings built-in support for providing information about data stored in the table at any point in time rather than only the data that is correct at the current moment in time.
Temporal tables are implemented as a pair of tables, a current table and a history table. Within each of these tables, start and end dates are stored and used to define the period of validity for each row.
You can refer to the following resources to learn more about this type of data structure:
- Official documentation
- First Look at System-Versioned Temporal Tables-Part 1: Creating Tables and Modifying Data
- First Look at System-Versioned Temporal Tables-Part 2: Querying Data and Optimization Considerations
- Tips Concerning Temporal Tables
- Temporal Tables, Partitioning, and ColumnStore Indexes
