SELECT distinct ID,LISTAGG(to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss'), ',')
WITHIN GROUP(ORDER BY CAST(PAGE_NBR as INT)) AS No_Of_Views
FROM db_name.schema_name.tbl
WHERE date_created = '2022-01-21'
AND flg_col = '0'
AND src_id NOT IN ('0','3','4','5')
AND ID='0127435345345'
group by ID;
When I run the above query, I'm getting distinct values
21/01/2022 20:21:17,21/01/2022 20:21:31,21/01/2022 20:22:23,21/01/2022 20:22:33
When I try to run the same query using CTE, I'm not getting distinct results. Is there anyway to form a query using CTE that will fetch distinct values:
WITH CTE AS (
SELECT
ID,CREATE_TS,
PAGE_NBR::INT AS PAGE_NBR
FROM db_name.schema_name.tbl
WHERE flg_col = '0'
AND src_id NOT IN ('0','3','4','5')
AND CREATE_DT = ''2022-21-01''
AND ID='0127435345345'
ORDER BY PAGE_NBR
)
SELECT
ID,
LISTAGG(to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss'), ',') WITHIN GROUP(ORDER BY CREATE_TS) AS No_Of_Views
FROM CTE
GROUP BY ID;
When I use a distinct inside LISTAGG ie. LISTAGG(to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss'), ',') I'm getting SQL compilation error: [CTE.CREATE_TS] is not a valid order by expression
I want the result same as the first query but I need to try the second method..
CodePudding user response:
First of all, your second statement isn't same with the first one. The first one has:
LISTAGG(to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss'), ',')
WITHIN GROUP(ORDER BY CAST(PAGE_NBR as INT)) AS No_Of_Views
The second one has:
LISTAGG(to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss'), ',') WITHIN GROUP(ORDER BY CREATE_TS) AS No_Of_Views
About the DISTINCT error, as it's mentioned in the docs, if you specify DISTINCT and WITHIN GROUP, both must refer to the same column.
https://docs.snowflake.com/en/sql-reference/functions/listagg.html#usage-notes
So your last statement should be like this:
...
SELECT
ID,
LISTAGG(to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss'), ',') WITHIN GROUP(ORDER BY to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss')) AS No_Of_Views
FROM CTE
GROUP BY ID;
CodePudding user response:
hmm. so I grabbed you first SQL, and made up some dummy data, and stripped the parts that did not really matter out:
WITH data AS (
SELECT id, to_timestamp(CREATE_TS) as CREATE_TS, PAGE_NBR FROM values
( 1, '2022-01-01 01:23:34', '1'),
( 1, '2022-01-02 01:23:34', '2'),
( 2, '2022-01-03 01:23:34', '3'),
( 3, '2022-01-04 01:23:34', '4')
v( id, CREATE_TS, PAGE_NBR )
)
SELECT distinct
ID,
LISTAGG(to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss'), ',') WITHIN GROUP(ORDER BY CAST(PAGE_NBR as INT)) AS No_Of_Views
FROM data
group by ID;
and I get:
| ID | NO_OF_VIEWS |
|---|---|
| 2 | 01/03/2022 01:23:34 |
| 3 | 01/04/2022 01:23:34 |
| 1 | 01/01/2022 01:23:34,01/02/2022 01:23:34 |
then I poke in the second block of SQL, pull all the same bit that don't mean anything, and have:
WITH data AS (
SELECT id, to_timestamp(CREATE_TS) as CREATE_TS, PAGE_NBR FROM values
( 1, '2022-01-01 01:23:34', '1'),
( 1, '2022-01-02 01:23:34', '2'),
( 2, '2022-01-03 01:23:34', '3'),
( 3, '2022-01-04 01:23:34', '4')
v( id, CREATE_TS, PAGE_NBR )
), CTE AS (
SELECT
ID,
CREATE_TS,
PAGE_NBR::INT AS PAGE_NBR
FROM data
)
SELECT
ID,
LISTAGG(to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss'), ',') WITHIN GROUP(ORDER BY CREATE_TS) AS No_Of_Views
FROM CTE
GROUP BY ID;
and that still works:
| ID | NO_OF_VIEWS |
|---|---|
| 2 | 01/03/2022 01:23:34 |
| 3 | 01/04/2022 01:23:34 |
| 1 | 01/01/2022 01:23:34,01/02/2022 01:23:34 |
In your first query, the distinct is doing nothing, as you code has a group by clause.
In your prior question you wanted distinct values in the array, but to order them by an alternative column ( PAGE_NBR ) you cannot use the clause DISTINCT in the LIST_AGG, thus why Lukasz soultion used a CTE to duplicate the data..
so show that here:
WITH data AS (
SELECT id, to_timestamp(CREATE_TS) as CREATE_TS, PAGE_NBR FROM values
( 1, '2022-01-01 01:23:34', '1'),
( 1, '2022-01-01 01:23:34', '2'),
( 1, '2022-01-01 01:23:34', '3'),
( 1, '2022-01-02 01:23:34', '2'),
( 2, '2022-01-03 01:23:34', '3'),
( 3, '2022-01-04 01:23:34', '4')
v( id, CREATE_TS, PAGE_NBR )
), cte AS (
SELECT ID, CREATE_TS, PAGE_NBR::INT AS VISIT_PAGE_NBR
FROM data
--WHERE FLG_COLUMN = '0'
-- AND SOURCE_CD NOT IN ('1','2','3','4')
-- AND DATE_CR = '2022-01-01'
QUALIFY ROW_NUMBER() OVER(PARTITION BY ID,CREATE_TS ORDER BY VISIT_PAGE_NBR::INT) = 1
)
SELECT ID,
LISTAGG(to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss'), ',') WITHIN GROUP(ORDER BY VISIT_PAGE_NBR) AS No_Of_Views
FROM cte
GROUP BY ID;
gives:
| ID | NO_OF_VIEWS |
|---|---|
| 1 | 01/01/2022 01:23:34,01/02/2022 01:23:34 |
| 2 | 01/03/2022 01:23:34 |
| 3 | 01/04/2022 01:23:34 |
the
QUALIFY ROW_NUMBER() OVER(PARTITION BY ID,CREATE_TS ORDER BY VISIT_PAGE_NBR::INT) = 1
is doing the task of only allowing distinct pairs of ID,CREATE_TS
by numbering them all, and the filtering out any that are not the first row.
Thus when the outer LIST_AGG is used the DISTINCT you originally had can be dropped. and the order by VISIT_PAGE_NBR can be used.
