Inner Join - special time conditions-CodePudding

Given an hourly table A with full heart_rate records, e.g.:

User    Hour  Heart_rate
Joe      1       60
Joe      2       70
Joe      3       72
Joe      4       75
Joe      5       68
Joe      6       71
Joe      7       78
Joe      8       83
Joe      9       85
Joe     10       80

And a subset hours where a purchase happened, e.g.

User    Hour    Purchase  
Joe      3       'Soda'
Joe      9       'Coke'
Joe     10     'Doughnut'

I want to keep only those records from A that are in B or at most 2hr behind the B subset, without duplication, i.e. and preserving both the heart_rate from A and the item purchased from b so the outcome is

User    Hour  Heart_rate Purchase 
Joe      1       60         null
Joe      2       70         null
Joe      3       72        'Soda'
Joe      7       78         null 
Joe      8       83         null
Joe      9       85        'Coke'
Joe     10       80       'Doughnut'

How can the result be achieved with an inner join, without duplication (in this case the hours 8&9) (This is an MWE, assume multiple users and timestamps instead of hours)

The obvious solution is to combine

Inner Join deduplication
Left join

Can this be achieved in a more elegant way?

CodePudding user response：

Your solutiuons should work and sounds good.

There is another way, using 3 Select Statements.

The inner Select combines both tables by UNION ALL. Because only tables with the same columns can be combinded, fields which are only in one table have to be defined in the other one as well and set to null. The column hour_eat is added to see when the last purchase has occured. By sorting this table, we can archive that under each row from table B lies now the row of table A which occures next.

In the middle Select statement the lag(Purchase) gets the last Purchase. If we only think about the rows from the 1st table, the Purchase value from the 2nd table is now at the right place. This comes in handy if timestamps and not defined hours are used. The row the last_value calculates the time between the purchase and measurement of the heart_beat.

The outer Select filters the rows of interest. The last 2 hours before the purchase and only the rows of the 1st table.

With 
heart_tbl as (SELECT "Joe" as USER, row_number() over() Hour, Heart_rate from unnest([60,72,72,75,68,71,78,83,85,80]) Heart_rate ),
eat_tbl as (Select "Joe" as User ,3 Hour , 'Soda' as Purchase UNION ALL SELECT "Joe", 9, 'Coke' UNION ALL SELECT "Joe", 10, 'Doughnut' )

SELECT user, hour,heart_rate,Purchase_,hours_till_Purchase
from
(
SELECT *,
    lag(Purchase) over (order by hour, heart_rate is not null) as Purchase_,
    hour-last_value(hour_eat ignore nulls) over (order by hour desc,heart_rate is not null) as hours_till_Purchase
From   # combine both tables to one table (ordered by hours)
(
    SELECT user, hour,heart_rate, null as Purchase, null as hour_eat from heart_tbl 
    UNION ALL
    Select user, hour, null as heart_rate, Purchase, hour from eat_tbl
)
)
Where heart_rate is not null and hours_till_Purchase >= -2
order by hour

CodePudding user response：

You could use an INNER join of the tables and conditional aggregation for the deduplication:

SELECT a.User, a.Hour, a.Heart_rate, 
       MAX(CASE WHEN a.Hour = b.Hour THEN b.Purchase END) Purchase
FROM a INNER JOIN b
ON b.User = a.User AND a.Hour BETWEEN b.Hour - 2 AND b.Hour
WHERE a.User = 'Joe' -- remove this line if you want results for all users
GROUP BY a.User, a.Hour, a.Heart_rate;

Or with MAX() window function:

SELECT DISTINCT a.*, 
       MAX(CASE WHEN a.Hour = b.Hour THEN b.Purchase END) OVER (PARTITION BY a.User, a.Hour) Purchase
FROM a INNER JOIN b
ON b.User = a.User AND a.Hour BETWEEN b.Hour - 2 AND b.Hour;

See the demo (for MySql but it is standard SQL).