Above is a table sample of the type of data in the table that I have. Due to bad input design, the person with the same e-mail will have multiple user_id. And this can repeat for multiple emails. The above example only shows the state of one e-mail.
Is there an easy SQL solution, to select the user_id selected_equivalent_parnt_id, based on the latest timestamp?
Using the example above, the ideal SQL output, should only return rows 1 and 2, where the the user_id in Rows 1 and 2 contains the latest max_completed_on_timestamp for the 2 unique selected_equivalent_parent_id.
Appreciate any guidance on how to best to this, thank you!
Not sure if I should be utilizing another MAX() aggregation, Partition By or some sort of Left join to exclude all the other rows.
CodePudding user response:
With the data you shared as:
WITH data as(
SELECT '10183244' as user_id, '[email protected]' as dummy_email, 'India' as country_name, 'IN' as country_code, 'APAC' as region_name,
'303806' as selected_equivalent_parent_id, '2021-12-29 23:22:12.000 08:00' as max_completed_on_timestamp UNION ALL
SELECT '10183244' as user_id, '[email protected]' as dummy_email, 'India' as country_name, 'IN' as country_code, 'APAC' as region_name,
'308856' as selected_equivalent_parent_id, '2021-12-29 23:22:12.000 08:00' as max_completed_on_timestamp UNION ALL
SELECT '3194303' as user_id, '[email protected]' as dummy_email, 'India' as country_name, 'IN' as country_code, 'APAC' as region_name,
'303806' as selected_equivalent_parent_id, '2021-12-29 19:41:12.000 08:00' as max_completed_on_timestamp UNION ALL
SELECT '2980649' as user_id, '[email protected]' as dummy_email, 'India' as country_name, 'IN' as country_code, 'APAC' as region_name,
'303806' as selected_equivalent_parent_id, '2021-12-28 23:22:12.000 08:00' as max_completed_on_timestamp UNION ALL
SELECT '2980649' as user_id, '[email protected]' as dummy_email, 'India' as country_name, 'IN' as country_code, 'APAC' as region_name,
'308856' as selected_equivalent_parent_id, NULL UNION ALL
SELECT '3194303' as user_id, '[email protected]' as dummy_email, 'India' as country_name, 'IN' as country_code, 'APAC' as region_name,
'308856' as selected_equivalent_parent_id, NULL
)
I did two subqueries one to group all the same values and the other only to grab the maximum value of the timestamp and the selected_equivalent_parent_id
d as(
SELECT user_id, dummy_email, country_name, country_code, region_name, selected_equivalent_parent_id, max(max_completed_on_timestamp) as max_completed_on_timestamp FROM data GROUP BY selected_equivalent_parent_id, user_id, dummy_email, country_code, country_name, region_name
),
t as(
select max(max_completed_on_timestamp) as max_completed_on_timestamp, selected_equivalent_parent_id from data group by selected_equivalent_parent_id
)
Then I joined the two subqueries where they have the same completed timestamp and the selected_equivalent_parent_id
select d.user_id, d.dummy_email, d.country_name, d.country_code, d.region_name, d.selected_equivalent_parent_id, d.max_completed_on_timestamp from t inner join d on d.selected_equivalent_parent_id= t.selected_equivalent_parent_id where d.max_completed_on_timestamp = t.max_completed_on_timestamp
If you have it in a table your query must work like the one below:
WITH d as(
SELECT user_id, dummy_email, country_name, country_code, region_name, selected_equivalent_parent_id, max(max_completed_on_timestamp) as max_completed_on_timestamp FROM `dataset.table` GROUP BY selected_equivalent_parent_id, user_id, dummy_email, country_code, country_name, region_name
),
t as(
select max(max_completed_on_timestamp) as max_completed_on_timestamp, selected_equivalent_parent_id from `dataset.table` group by selected_equivalent_parent_id
)
select d.user_id, d.selected_equivalent_parent_id, d.max_completed_on_timestamp from t inner join d on d.selected_equivalent_parent_id= t.selected_equivalent_parent_id where d.max_completed_on_timestamp = t.max_completed_on_timestamp
The result of this query is the next one:

CodePudding user response:
Consider below simple approach
select as value array_agg(t order by max_completed_on_timestamp desc)[offset(0)]
from your_table t
group by selected_equivalent_parent_id
if apply to sample data in your question - output is



