Home > Mobile >  Need help in using LISTAGG or REGEXP_REPLACE with DISTINCT function
Need help in using LISTAGG or REGEXP_REPLACE with DISTINCT function

Time:01-21

SELECT  distinct ID,LISTAGG(to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss'), ',') 
WITHIN GROUP(ORDER BY CAST(PAGE_NBR as INT)) AS No_Of_Views
FROM db_name.schema_name.tbl
WHERE date_created = '2022-01-21'
AND flg_col = '0'
AND src_id NOT IN ('0','3','4','5')
AND ID='0127435345345'
group by ID;

When I run the above query, I'm getting distinct values

21/01/2022 20:21:17,21/01/2022 20:21:31,21/01/2022 20:22:23,21/01/2022 20:22:33

When I try to run the same query using CTE, I'm not getting distinct results. Is there anyway to form a query using CTE that will fetch distinct values:

WITH CTE AS (
SELECT
 ID,CREATE_TS,
PAGE_NBR::INT AS PAGE_NBR
FROM db_name.schema_name.tbl 
WHERE flg_col = '0'
AND src_id NOT IN ('0','3','4','5')
AND CREATE_DT = ''2022-21-01''
AND ID='0127435345345'
ORDER BY PAGE_NBR
)
SELECT
 ID,
LISTAGG(to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss'), ',') WITHIN GROUP(ORDER BY CREATE_TS) AS No_Of_Views 
FROM CTE
GROUP BY ID;

When I use a distinct inside LISTAGG ie. LISTAGG(to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss'), ',') I'm getting SQL compilation error: [CTE.CREATE_TS] is not a valid order by expression

I want the result same as the first query but I need to try the second method..

CodePudding user response:

First of all, your second statement isn't same with the first one. The first one has:

LISTAGG(to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss'), ',') 
WITHIN GROUP(ORDER BY CAST(PAGE_NBR as INT)) AS No_Of_Views

The second one has:

LISTAGG(to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss'), ',') WITHIN GROUP(ORDER BY CREATE_TS) AS No_Of_Views 

About the DISTINCT error, as it's mentioned in the docs, if you specify DISTINCT and WITHIN GROUP, both must refer to the same column.

https://docs.snowflake.com/en/sql-reference/functions/listagg.html#usage-notes

So your last statement should be like this:

...
SELECT
 ID,
LISTAGG(to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss'), ',') WITHIN GROUP(ORDER BY to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss')) AS No_Of_Views 
FROM CTE
GROUP BY ID;

CodePudding user response:

hmm. so I grabbed you first SQL, and made up some dummy data, and stripped the parts that did not really matter out:

WITH data AS (
  SELECT id, to_timestamp(CREATE_TS) as CREATE_TS, PAGE_NBR  FROM values
  ( 1, '2022-01-01 01:23:34', '1'),
  ( 1, '2022-01-02 01:23:34', '2'),
  ( 2, '2022-01-03 01:23:34', '3'),
  ( 3, '2022-01-04 01:23:34', '4')
  v( id, CREATE_TS, PAGE_NBR )
)
SELECT distinct 
    ID,
    LISTAGG(to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss'), ',') WITHIN GROUP(ORDER BY CAST(PAGE_NBR as INT)) AS No_Of_Views
FROM data
group by ID;

and I get:

ID NO_OF_VIEWS
2 01/03/2022 01:23:34
3 01/04/2022 01:23:34
1 01/01/2022 01:23:34,01/02/2022 01:23:34

then I poke in the second block of SQL, pull all the same bit that don't mean anything, and have:

WITH data AS (
  SELECT id, to_timestamp(CREATE_TS) as CREATE_TS, PAGE_NBR  FROM values
    ( 1, '2022-01-01 01:23:34', '1'),
    ( 1, '2022-01-02 01:23:34', '2'),
    ( 2, '2022-01-03 01:23:34', '3'),
    ( 3, '2022-01-04 01:23:34', '4')
  v( id, CREATE_TS, PAGE_NBR )
), CTE AS (
    SELECT
      ID, 
      CREATE_TS,
      PAGE_NBR::INT AS PAGE_NBR
    FROM data
)
SELECT
 ID,
LISTAGG(to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss'), ',') WITHIN GROUP(ORDER BY CREATE_TS) AS No_Of_Views 
FROM CTE
GROUP BY ID;

and that still works:

ID NO_OF_VIEWS
2 01/03/2022 01:23:34
3 01/04/2022 01:23:34
1 01/01/2022 01:23:34,01/02/2022 01:23:34

In your first query, the distinct is doing nothing, as you code has a group by clause.

In your prior question you wanted distinct values in the array, but to order them by an alternative column ( PAGE_NBR ) you cannot use the clause DISTINCT in the LIST_AGG, thus why Lukasz soultion used a CTE to duplicate the data..

so show that here:

WITH data AS (
  SELECT id, to_timestamp(CREATE_TS) as CREATE_TS, PAGE_NBR  FROM values
    ( 1, '2022-01-01 01:23:34', '1'),
    ( 1, '2022-01-01 01:23:34', '2'),
    ( 1, '2022-01-01 01:23:34', '3'),
    ( 1, '2022-01-02 01:23:34', '2'),
    ( 2, '2022-01-03 01:23:34', '3'),
    ( 3, '2022-01-04 01:23:34', '4')
  v( id, CREATE_TS, PAGE_NBR )
), cte AS (
  SELECT ID, CREATE_TS, PAGE_NBR::INT AS VISIT_PAGE_NBR
  FROM data
  --WHERE FLG_COLUMN = '0'
  --  AND SOURCE_CD NOT IN ('1','2','3','4')
  --  AND DATE_CR = '2022-01-01'
  QUALIFY ROW_NUMBER() OVER(PARTITION BY ID,CREATE_TS ORDER BY VISIT_PAGE_NBR::INT) = 1
)
SELECT ID,
    LISTAGG(to_varchar(CREATE_TS, 'mm/dd/yyyy hh:mi:ss'), ',') WITHIN GROUP(ORDER BY VISIT_PAGE_NBR) AS No_Of_Views
FROM cte
GROUP BY ID;

gives:

ID NO_OF_VIEWS
1 01/01/2022 01:23:34,01/02/2022 01:23:34
2 01/03/2022 01:23:34
3 01/04/2022 01:23:34

the

QUALIFY ROW_NUMBER() OVER(PARTITION BY ID,CREATE_TS ORDER BY VISIT_PAGE_NBR::INT) = 1 is doing the task of only allowing distinct pairs of ID,CREATE_TS

by numbering them all, and the filtering out any that are not the first row.

Thus when the outer LIST_AGG is used the DISTINCT you originally had can be dropped. and the order by VISIT_PAGE_NBR can be used.

  •  Tags:  
  • Related