I am trying to retrieve records based on a custom field "ci_ku". For the same values of "ci_ku" we will be having multiple "l1m_visits", and I want to retrieve the minimum value of "l1mvisits" for each "ci_ku". and i want to get the ci_ku by removing the string between 1st and 2nd underscore(-) in ku, how to write a query to get the ci_ku here
Sample Data:
| ku | item | l1m_visits |
|---|---|---|
| 1234-5678-HIJK | 1234 | A |
| 1234-9012-HIJK | 1234 | B |
| 56457-12456-DF-GH-TC | 56457 | D |
Expected Output:
| ku | ci_ku | l1m_visits |
|---|---|---|
| 1234-5678-HIJK | 1234-HIJK | A |
| 56457-12456-DF-GH-TC | 56457-DF-GH-TC | D |
Have tried the query below:
WITH tab_with_ci_ku AS (
select split(ku, '-', 3)ivm_arr,
l1m_visits,
last_refresh_date
FROM db.scema.table
), ranked_visits AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY CONCAT(ivm_arr[2],item) as ci_sku ORDER BY l1m_visits) AS rn
FROM tab_with_ci_ku
)
SELECT sku,ci_ku
FROM ranked_visits
WHERE rn = 1
and facing the following error:
mismatched input 'ci_ku'. Expecting: 'ALTER', 'ANALYZE', 'CALL', 'COMMENT', 'COMMIT', 'CREATE', 'DEALLOCATE', 'DELETE', 'DENY', 'DESC', 'DESCRIBE', 'DROP', 'EXECUTE', 'EXPLAIN', 'GRANT', 'INSERT', 'MERGE', 'PREPARE', 'REFRESH', 'RESET', 'REVOKE', 'ROLLBACK', 'SET', 'SHOW', 'START', 'TRUNCATE', 'UPDATE', 'USE', <query>
CodePudding user response:
I would use a regex replacement to handle the ci_ku requirement. Note that this replacement should happen first, as the call to ROW_NUMBER depends on it.
WITH tab_with_ci_ku AS (
SELECT *, REGEXP_REPLACE(ku, '([^-] )-[^-] -(.*)', '$1-$2') AS ci_ku
FROM hdpsa30.pps.ivm_bands_boomerang_sos
),
ranked_visits AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ci_ku ORDER BY l1m_visits) AS rn
FROM tab_with_ci_ku
)
SELECT ku, ci_ku, l1m_visits
FROM ranked_visits
WHERE rn = 1;
Here is an explanation of the regex:
([^-] )match and capture in$1the first term-match a literal hyphen[^-]match the second term (to be excluded in the replacement)-match a literal hyphen(.*)match and capture in$2the remainder of the SKU
Then we replace with $1-$2 to effectively splice out the second term.
