Given the following table structure and sample data:
CREATE TABLE IF NOT EXISTS `records` (
`id` int unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
`external_id` int unsigned NOT NULL,
`sub_id` int unsigned DEFAULT 0,
`amount` bigint unsigned NOT NULL,
`updated_at` TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) DEFAULT CHARSET=utf8;
INSERT INTO `records` (`external_id`, `sub_id`,`amount`, `updated_at`) VALUES
(1, 0, 160, '2022-01-13 16:00:00'),
(1, 1001, 150, '2022-01-13 16:40:00'),
(1, 1002, 170, '2022-06-13 16:40:00'),
(1, 1003, 170, '2022-06-13 16:40:00');
I'm trying to get the MIN value for amount for the past X (assume 30 days), for a given external_id, using the timestamp field updated_at, with the following constraints:
- If there are no records for the past 30 days (changes), the latest record is still a valid one,
- Each new record for a given
external_idwould "cancel and replace" the previous record, - If there are both records with
sub_id = 0andsub_id <> 0, for the same givenexternal_idthe records withsub_id <> 0would prevail.
So, the query for the above data should return 150.
A fiddle and what I have tried at: https://dbfiddle.uk/?rdbms=mariadb_10.6&fiddle=e4ddd6b55dbccf607633c1cf7d9cd4ef
Extra Information (Later edit)
To give you a better picture of the whole idea here:
Each time an update is made on the amount field, there is a new record created in the records table (to create a history log).
My task is to query the MIN amount for the past 30 days.
Some external_id records also have that sub_id.
Whether there is or isn't a sub_id, the record for the external_id would be created.
Amounts with sub_id are usually bigger (something extra gets added to the amount for product_id).
CodePudding user response:
At the moment it's not clear what should happen if there are multiple external_id values. You seem to want only one row returned? (If this is not the case, please improve the example to include the desired results when there are multiple different external_id values.)
If you do just want one value returned, you could simply ORDER BY <something> LIMIT 1
I, however, am going to assume you want just one value per external_id.
WITH
filtered_sorted
AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY external_id
ORDER BY CASE WHEN sub_id <> 0 THEN 0 ELSE 1 END,
amount
)
AS rn
FROM
records AS r
WHERE
updated_at >= (SELECT COALESCE(MAX(updated_at), DATE(NOW()) - INTERVAL 30 DAY)
FROM records
WHERE updated_at <= DATE(NOW()) - INTERVAL 30 DAY
AND external_id = r.external_id
)
)
SELECT
*
FROM
filtered_sorted
WHERE
rn = 1
This is based on the notion that the most recent row on or before the start of the day 30 days ago is still valid and should be included in consideration for the lowest amount.
- For each
external_id...- Ignore all records
updated_atafter the start of 30 days ago - Return the most recent remaining
updated_at(which could be exactly the start of the day 30 days ago) - If no such row is found, return the start of the day 30 days ago
- We will consider all rows for that
updated_atonwards
- Ignore all records
Then the ROW_NUMBER() prefers rows with a non-zero sub_id and then rows with the lowest amount.
- For the rows being considered, after the
WHEREclause described above- Assign each row a row number
- Each
external_idshould have it's own individual sequence of row numbers (achieved withPARTITION BYexternal_id`) - Rows with
sub_id <> 0should come before before any rows withsub_id = 0(achieved byORDER BY CASE WHEN sub_id <> 0 THEN 0 ELSE 1 END) - Rows with lower
amountvalues should come first (achieved with `ORDER BY amount.)
Then, just return rows where the assigned row number is 1
- Partitioned by
external_id - Filtered by
updated_at - Sorted by
sub_id <> 0,amount
(One row per external_id)
