In a Postgres database I have a table with the following columns:
ID (Pimary Key)
Code
Date
I'm trying to extract data ordered by Date and grouped by Code so that the most recent date will determine what code rows should be grouped first and so forth (if it makes sense). An example:
007 2022-01-04
007 2022-01-01
007 2021-12-19
002 2022-01-03
002 2021-12-02
002 2021-11-15
035 2022-01-01
035 2021-11-30
035 2021-05-03
001 2021-12-31
022 2021-12-07
076 2021-11-19
I thought I could achieve this with the following query:
SELECT * FROM Table
GROUP BY Table.Code
ORDER BY Table.Date DESC
but this gives me
ERROR: column "Table.ID" must appear in the GROUP BY clause or be used in an aggregate function
And if I add the column ID to the GROUP BY the result I get is just a list ordered by Date with all the Codes mixed.
Is there any way to achieve whai I want?
CodePudding user response:
First, select the columns that you want to group e.g. Code, that you want to apply an aggregate function (Date).
Second, list the columns that you want to group in the GROUP BY clause.
In the order by clause, use the same logic as the select clause.
https://www.postgresqltutorial.com/postgresql-group-by/
Tables:
CREATE TABLE "Table"
("Code" int, "Date" timestamp)
;
INSERT INTO "Table"
("Code", "Date")
VALUES
(007, '2022-01-04 00:00:00'),
(007, '2022-01-01 00:00:00'),
(007, '2021-12-19 00:00:00'),
(002, '2022-01-03 00:00:00'),
(002, '2021-12-02 00:00:00'),
(002, '2021-11-15 00:00:00'),
(035, '2022-01-01 00:00:00'),
(035, '2021-11-30 00:00:00'),
(035, '2021-05-03 00:00:00'),
(001, '2021-12-31 00:00:00'),
(022, '2021-12-07 00:00:00'),
(076, '2021-11-19 00:00:00')
;
Select
SELECT
"Table"."Code",
max("Table"."Date")
FROM
"Table"
GROUP BY
"Table"."Code"
ORDER BY
max("Table"."Date") DESC
Output:
| Code | max |
|---|---|
| 7 | 2022-01-04T00:00:00Z |
| 2 | 2022-01-03T00:00:00Z |
| 35 | 2022-01-01T00:00:00Z |
| 1 | 2021-12-31T00:00:00Z |
| 22 | 2021-12-07T00:00:00Z |
| 76 | 2021-11-19T00:00:00Z |
Edit1
A group by clause should contain a unique value per line.
The example above showed a way to fix the error on your data.
Table with ID:
CREATE TABLE "Table" (
"ID" serial not null primary key,
"Code" varchar,
"Date" timestamp
);
INSERT INTO "Table"
("Code", "Date")
VALUES
('007', '2022-01-04 00:00:00'),
('007', '2022-01-01 00:00:00'),
('007', '2021-12-19 00:00:00'),
('002', '2022-01-03 00:00:00'),
('002', '2021-12-02 00:00:00'),
('002', '2021-11-15 00:00:00'),
('035', '2022-01-01 00:00:00'),
('035', '2021-11-30 00:00:00'),
('035', '2021-05-03 00:00:00'),
('001', '2021-12-31 00:00:00'),
('022', '2021-12-07 00:00:00'),
('076', '2021-11-19 00:00:00')
;
Select:
SELECT * FROM "Table" ORDER BY "Code", "Date" DESC;
Output:
| ID | Code | Date |
|---|---|---|
| 10 | 001 | 2021-12-31T00:00:00Z |
| 4 | 002 | 2022-01-03T00:00:00Z |
| 5 | 002 | 2021-12-02T00:00:00Z |
| 6 | 002 | 2021-11-15T00:00:00Z |
| 1 | 007 | 2022-01-04T00:00:00Z |
| 2 | 007 | 2022-01-01T00:00:00Z |
| 3 | 007 | 2021-12-19T00:00:00Z |
| 11 | 022 | 2021-12-07T00:00:00Z |
| 7 | 035 | 2022-01-01T00:00:00Z |
| 8 | 035 | 2021-11-30T00:00:00Z |
| 9 | 035 | 2021-05-03T00:00:00Z |
| 12 | 076 | 2021-11-19T00:00:00Z |
Edit 2:
I join a select b with the entire dataset. The select b is used for sort only and is what you tried.
With "b" as
( select
"Code",
max("Date") as "Date"
from
"Table"
group by
"Code"
)
SELECT
"Table"."Code",
"Table"."Date"
FROM
"Table" left join "b" on "Table"."Code" = "b"."Code"
ORDER BY
"b"."Date" desc,
"Table"."Date" DESC;
Output:
| Code | Date |
|---|---|
| 007 | 2022-01-04T00:00:00Z |
| 007 | 2022-01-01T00:00:00Z |
| 007 | 2021-12-19T00:00:00Z |
| 002 | 2022-01-03T00:00:00Z |
| 002 | 2021-12-02T00:00:00Z |
| 002 | 2021-11-15T00:00:00Z |
| 035 | 2022-01-01T00:00:00Z |
| 035 | 2021-11-30T00:00:00Z |
| 035 | 2021-05-03T00:00:00Z |
| 001 | 2021-12-31T00:00:00Z |
| 022 | 2021-12-07T00:00:00Z |
| 076 | 2021-11-19T00:00:00Z |
Edit 3
Shorter solution using max over partition by.
SELECT
"Code",
"Date"
FROM
"Table"
ORDER BY
max("Date") over (partition by "Code") DESC,
"Table"."Date" DESC
;
Output:
| Code | Date |
|---|---|
| 007 | 2022-01-04T00:00:00Z |
| 007 | 2022-01-01T00:00:00Z |
| 007 | 2021-12-19T00:00:00Z |
| 002 | 2022-01-03T00:00:00Z |
| 002 | 2021-12-02T00:00:00Z |
| 002 | 2021-11-15T00:00:00Z |
| 035 | 2022-01-01T00:00:00Z |
| 035 | 2021-11-30T00:00:00Z |
| 035 | 2021-05-03T00:00:00Z |
| 001 | 2021-12-31T00:00:00Z |
| 022 | 2021-12-07T00:00:00Z |
| 076 | 2021-11-19T00:00:00Z |
CodePudding user response:
Edit 2 and the more elegant Edit 3 from @DavidLukas help me achieve what I needed
