Home > Software design >  SQL Select the correct user-id parent_id paring based on MAX completed timestamp
SQL Select the correct user-id parent_id paring based on MAX completed timestamp

Time:01-19

Table sample

Above is a table sample of the type of data in the table that I have. Due to bad input design, the person with the same e-mail will have multiple user_id. And this can repeat for multiple emails. The above example only shows the state of one e-mail.

Is there an easy SQL solution, to select the user_id selected_equivalent_parnt_id, based on the latest timestamp?

Using the example above, the ideal SQL output, should only return rows 1 and 2, where the the user_id in Rows 1 and 2 contains the latest max_completed_on_timestamp for the 2 unique selected_equivalent_parent_id.

Appreciate any guidance on how to best to this, thank you!

Not sure if I should be utilizing another MAX() aggregation, Partition By or some sort of Left join to exclude all the other rows.

CodePudding user response:

With the data you shared as:

WITH data as(
   SELECT '10183244' as user_id, '[email protected]' as dummy_email, 'India' as country_name, 'IN' as country_code, 'APAC' as region_name,
   '303806' as selected_equivalent_parent_id, '2021-12-29 23:22:12.000 08:00' as max_completed_on_timestamp UNION ALL
   SELECT '10183244' as user_id, '[email protected]' as dummy_email, 'India' as country_name, 'IN' as country_code, 'APAC' as region_name,
   '308856' as selected_equivalent_parent_id, '2021-12-29 23:22:12.000 08:00' as max_completed_on_timestamp UNION ALL
   SELECT '3194303' as user_id, '[email protected]' as dummy_email, 'India' as country_name, 'IN' as country_code, 'APAC' as region_name,
   '303806' as selected_equivalent_parent_id, '2021-12-29 19:41:12.000 08:00' as max_completed_on_timestamp UNION ALL
   SELECT '2980649' as user_id, '[email protected]' as dummy_email, 'India' as country_name, 'IN' as country_code, 'APAC' as region_name,
   '303806' as selected_equivalent_parent_id, '2021-12-28 23:22:12.000 08:00' as max_completed_on_timestamp UNION ALL
   SELECT '2980649' as user_id, '[email protected]' as dummy_email, 'India' as country_name, 'IN' as country_code, 'APAC' as region_name,
   '308856' as selected_equivalent_parent_id, NULL  UNION ALL
   SELECT '3194303' as user_id, '[email protected]' as dummy_email, 'India' as country_name, 'IN' as country_code, 'APAC' as region_name,
   '308856' as selected_equivalent_parent_id, NULL
)

I did two subqueries one to group all the same values and the other only to grab the maximum value of the timestamp and the selected_equivalent_parent_id

d as(
SELECT user_id, dummy_email, country_name, country_code, region_name, selected_equivalent_parent_id, max(max_completed_on_timestamp) as max_completed_on_timestamp FROM data GROUP BY selected_equivalent_parent_id, user_id, dummy_email, country_code, country_name, region_name
),
t as(
   select max(max_completed_on_timestamp) as max_completed_on_timestamp, selected_equivalent_parent_id from data group by selected_equivalent_parent_id
)

Then I joined the two subqueries where they have the same completed timestamp and the selected_equivalent_parent_id

select d.user_id, d.dummy_email, d.country_name, d.country_code, d.region_name, d.selected_equivalent_parent_id,  d.max_completed_on_timestamp from t inner join d on d.selected_equivalent_parent_id= t.selected_equivalent_parent_id where d.max_completed_on_timestamp = t.max_completed_on_timestamp

The result is the following: enter image description here

If you have it in a table your query must work like the one below:

WITH d as(
SELECT user_id, dummy_email, country_name, country_code, region_name, selected_equivalent_parent_id, max(max_completed_on_timestamp) as max_completed_on_timestamp FROM `dataset.table` GROUP BY selected_equivalent_parent_id, user_id, dummy_email, country_code, country_name, region_name
),
t as(
   select max(max_completed_on_timestamp) as max_completed_on_timestamp, selected_equivalent_parent_id from `dataset.table` group by selected_equivalent_parent_id
)
select d.user_id, d.selected_equivalent_parent_id,  d.max_completed_on_timestamp from t inner join d on d.selected_equivalent_parent_id= t.selected_equivalent_parent_id where d.max_completed_on_timestamp = t.max_completed_on_timestamp

The result of this query is the next one: enter image description here

CodePudding user response:

Consider below simple approach

select as value array_agg(t order by max_completed_on_timestamp desc)[offset(0)]
from your_table t
group by selected_equivalent_parent_id          

if apply to sample data in your question - output is

enter image description here

  •  Tags:  
  • Related