I am the end-user of a SQL Server DB with multiple lines ordered by date. Lets take this DB as an example:
| Amount | Date |
|---|---|
| 23.5 | 20210512010220111 |
| 24 | 20210512020220111 |
| 30 | 20210512030220111 |
| 1.2 | 20210513011020111 |
| 1000 | 20210513020220111 |
| 24 | 20210514100220111 |
| 240 | 20210514100220111 |
Be advised that the date is just a long that represent the date in the format: yyyymmddhhMMssfff.
I am trying to create a SQL query like this: "Get the first row of each day" So for the above example the result will be:
| Amount | Date |
|---|---|
| 23.5 | 20210512010220111 |
| 1.2 | 20210513011020111 |
| 24 | 20210514100220111 |
I saw this example in multiple sources: https://learnsql.com/cookbook/how-to-select-the-first-row-in-each-group-by-group/
The problem is when I tried it that was way too slow for me the DB is storing hundreds of millions of rows (with 9 columns each)
A couple of weeks ago I used a similar(ish) query for a daily min, avg, max:
SELECT MIN(Amount), AVG(Amount), MAX(Amount)
FROM table
GROUP BY Date/1000000000
- the "/1000000000" is for days.
That worked quickly enough, if there is something similar to FIRST(Amount) that would be great. Just to clarify, I am just an end-user, I have no saying over the overall structure of the DB.
Edit: This is the query I tried and was too slow:
WITH added_row_number AS (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY Date/1000000000 ORDER BY Date ASC) AS row_number
FROM table
)
SELECT
*
FROM added_row_number
WHERE row_number = 1;
Thank you.
CodePudding user response:
try something to affect
SELECT t.yyyy_mm_dd_date, table.*
FROM table
JOIN (
SELECT SUBSTRING (Date, 1, 8) as yyyy_mm_dd_date, MIN(Date) as min_date
FROM table
) t
ON t.min_date = table.Date
in general i find SQL queries to run fast with joins and aggregations (especially over their indices), so if can translate query to use those believe generally should run fairly fast
CodePudding user response:
Feels like you can use a cte here and cast the date strings as actual DATE values (the last query) here is my sample using test data (you did not post the column types so I guessed)
Not 100% clear about the "date" column if it is an actual datetime, you can just cast as a date
DECLARE @mytable TABLE (
Amount NUMERIC(10,2) NOT NULL,
[Date] VARCHAR(30) NOT NULL
);
INSERT INTO @mytable(Amount,[Date])
VALUES
(3.5, '20210512010220111'),
(24.0, '20210512020220111'),
(30.0,'20210512030220111'),
(1.2, '20210513011020111'),
(1000.0, '20210513020220111'),
(24.0, '20210514100220111'),
(240.0, '20210514100220111')
;
SELECT
[Amount],
MAX(CAST( LEFT([Date], 8) AS DATE)) AS NewDate
FROM @mytable
GROUP BY AMOUNT
ORDER BY MAX(CAST( LEFT([Date], 8) AS DATE)) DESC;
/* this is what we want perhaps: */
;
WITH cte AS (
SELECT
Amount,
CAST(LEFT([Date], 8) AS DATE) AS MyDate,
ROW_NUMBER() OVER(PARTITION BY CAST(LEFT([Date], 8) AS DATE) ORDER BY CAST(LEFT([Date], 8) AS DATE) DESC) AS row_number
FROM @mytable
)
SELECT
*
FROM cte
WHERE row_number = 1;
