Home > database >  GROUP BY and ORDER BY gives aggregate function error
GROUP BY and ORDER BY gives aggregate function error

Time:01-06

In a Postgres database I have a table with the following columns:

ID (Pimary Key)
Code
Date

I'm trying to extract data ordered by Date and grouped by Code so that the most recent date will determine what code rows should be grouped first and so forth (if it makes sense). An example:

007    2022-01-04
007    2022-01-01
007    2021-12-19
002    2022-01-03
002    2021-12-02
002    2021-11-15
035    2022-01-01
035    2021-11-30
035    2021-05-03
001    2021-12-31
022    2021-12-07
076    2021-11-19

I thought I could achieve this with the following query:

SELECT * FROM Table
GROUP BY Table.Code
ORDER BY Table.Date DESC

but this gives me

ERROR:  column "Table.ID" must appear in the GROUP BY clause or be used in an aggregate function

And if I add the column ID to the GROUP BY the result I get is just a list ordered by Date with all the Codes mixed.

Is there any way to achieve whai I want?

CodePudding user response:

First, select the columns that you want to group e.g. Code, that you want to apply an aggregate function (Date). Second, list the columns that you want to group in the GROUP BY clause. In the order by clause, use the same logic as the select clause.

https://www.postgresqltutorial.com/postgresql-group-by/

Tables:

CREATE TABLE "Table"
    ("Code" int, "Date" timestamp)
;

INSERT INTO "Table"
    ("Code", "Date")
VALUES
    (007, '2022-01-04 00:00:00'),
    (007, '2022-01-01 00:00:00'),
    (007, '2021-12-19 00:00:00'),
    (002, '2022-01-03 00:00:00'),
    (002, '2021-12-02 00:00:00'),
    (002, '2021-11-15 00:00:00'),
    (035, '2022-01-01 00:00:00'),
    (035, '2021-11-30 00:00:00'),
    (035, '2021-05-03 00:00:00'),
    (001, '2021-12-31 00:00:00'),
    (022, '2021-12-07 00:00:00'),
    (076, '2021-11-19 00:00:00')
;

Select

SELECT
  "Table"."Code",
   max("Table"."Date")
FROM
  "Table"
GROUP BY
  "Table"."Code"
ORDER BY
  max("Table"."Date") DESC

Output:

Code max
7 2022-01-04T00:00:00Z
2 2022-01-03T00:00:00Z
35 2022-01-01T00:00:00Z
1 2021-12-31T00:00:00Z
22 2021-12-07T00:00:00Z
76 2021-11-19T00:00:00Z

Edit1

A group by clause should contain a unique value per line.

The example above showed a way to fix the error on your data.

Table with ID:

CREATE TABLE "Table" (
     "ID" serial not null primary key,
     "Code" varchar,
     "Date" timestamp
);

INSERT INTO "Table"
    ("Code", "Date")
VALUES
    ('007', '2022-01-04 00:00:00'),
    ('007', '2022-01-01 00:00:00'),
    ('007', '2021-12-19 00:00:00'),
    ('002', '2022-01-03 00:00:00'),
    ('002', '2021-12-02 00:00:00'),
    ('002', '2021-11-15 00:00:00'),
    ('035', '2022-01-01 00:00:00'),
    ('035', '2021-11-30 00:00:00'),
    ('035', '2021-05-03 00:00:00'),
    ('001', '2021-12-31 00:00:00'),
    ('022', '2021-12-07 00:00:00'),
    ('076', '2021-11-19 00:00:00')
;

Select:

SELECT * FROM "Table" ORDER BY "Code", "Date" DESC;

Output:

ID Code Date
10 001 2021-12-31T00:00:00Z
4 002 2022-01-03T00:00:00Z
5 002 2021-12-02T00:00:00Z
6 002 2021-11-15T00:00:00Z
1 007 2022-01-04T00:00:00Z
2 007 2022-01-01T00:00:00Z
3 007 2021-12-19T00:00:00Z
11 022 2021-12-07T00:00:00Z
7 035 2022-01-01T00:00:00Z
8 035 2021-11-30T00:00:00Z
9 035 2021-05-03T00:00:00Z
12 076 2021-11-19T00:00:00Z

Edit 2:

I join a select b with the entire dataset. The select b is used for sort only and is what you tried.

With "b" as
( select
   "Code",
    max("Date") as "Date"
  from
   "Table"
  group by
   "Code"
)
SELECT
 "Table"."Code",
 "Table"."Date"
FROM
"Table" left join "b" on "Table"."Code" = "b"."Code"
ORDER BY
 "b"."Date" desc,
 "Table"."Date" DESC;

Output:

Code Date
007 2022-01-04T00:00:00Z
007 2022-01-01T00:00:00Z
007 2021-12-19T00:00:00Z
002 2022-01-03T00:00:00Z
002 2021-12-02T00:00:00Z
002 2021-11-15T00:00:00Z
035 2022-01-01T00:00:00Z
035 2021-11-30T00:00:00Z
035 2021-05-03T00:00:00Z
001 2021-12-31T00:00:00Z
022 2021-12-07T00:00:00Z
076 2021-11-19T00:00:00Z

Edit 3

Shorter solution using max over partition by.

SELECT
"Code",
"Date"
FROM
"Table"
ORDER BY
 max("Date") over (partition by "Code") DESC,
 "Table"."Date" DESC
;

Output:

Code Date
007 2022-01-04T00:00:00Z
007 2022-01-01T00:00:00Z
007 2021-12-19T00:00:00Z
002 2022-01-03T00:00:00Z
002 2021-12-02T00:00:00Z
002 2021-11-15T00:00:00Z
035 2022-01-01T00:00:00Z
035 2021-11-30T00:00:00Z
035 2021-05-03T00:00:00Z
001 2021-12-31T00:00:00Z
022 2021-12-07T00:00:00Z
076 2021-11-19T00:00:00Z

CodePudding user response:

Edit 2 and the more elegant Edit 3 from @DavidLukas help me achieve what I needed

  •  Tags:  
  • Related