I have Table 1 and Table 2, where Table 1 has unique customer id and their report date and Table 2 has the entire date range and corresponding values for each date for each customer id. Basically, I want to pull 2 dates before the report date and 2 dates after the report date for each customer id to get the Desired Output Table below. SQL - pull data for date range based on a start date seems to be the closest to what I want, but I don't know how to tweak the answer to suit my requirement.
Below is my code, but it does not work in Hive/Hue SQL. Appreciate your help :)
SELECT b.date, a.report_date, a.customer_id, b.value
FROM table1 a
LEFT JOIN table2 b ON b.date = a.report_date AND b.customer_id = a.customer_id
WHERE b.date IN (DATE_ADD(a.report_date, 2)
OR b.date IN (DATE_SUB(a.report_date, 2)
TABLE 1
| customer id | report date |
|---|---|
| 123 | 15/01/2021 |
| 456 | 20/02/2021 |
TABLE 2
| customer id | date | value |
|---|---|---|
| ... | ... | ... |
| 123 | 13/01/2021 | 10 |
| 123 | 14/01/2021 | 13 |
| 123 | 15/01/2021 | 9 |
| 123 | 16/01/2021 | 19 |
| 123 | 17/01/2021 | 20 |
| ... | ... | ... |
| ... | ... | ... |
| 456 | 18/02/2021 | 4 |
| 456 | 19/02/2021 | 5 |
| 456 | 20/02/2021 | 2 |
| 456 | 21/02/2021 | 9 |
| 456 | 22/02/2021 | 1 |
| ... | ... | ... |
DESIRED OUTPUT TABLE
| date | report date | customer id | value |
|---|---|---|---|
| 13/01/2021 | 15/01/2021 | 123 | 10 |
| 14/01/2021 | 15/01/2021 | 123 | 13 |
| 15/01/2021 | 15/01/2021 | 123 | 9 |
| 16/01/2021 | 15/01/2021 | 123 | 19 |
| 17/01/2021 | 15/01/2021 | 123 | 20 |
| 18/02/2021 | 20/02/2021 | 456 | 4 |
| 19/02/2021 | 20/02/2021 | 456 | 5 |
| 20/02/2021 | 20/02/2021 | 456 | 2 |
| 21/02/2021 | 20/02/2021 | 456 | 9 |
| 22/02/2021 | 20/02/2021 | 456 | 1 |
CodePudding user response:
The problem is in the JOIN segment: you need to JOIN by customer_id and not dates:
ON b.customer_id = a.customer_id
CodePudding user response:
Shouldn't it be
SELECT b.date, a.report_date, a.customer_id, b.value
FROM table1 a
LEFT JOIN table2 b ON b.customer_id = a.customer_id
WHERE
b.date BETWEEN DATE_SUB(a.report_date, 2)
AND DATE_ADD(a.report_date, 2)
I don't really know HIVE SQL but IN is a set/list operator, so you would be looking for the two dates, but the closing brackets are missing, too.
CodePudding user response:
It's not clear that you actually need an outer join. When you do use one you can't filter on the inner table afterward though.
LEFT JOIN table2 b ON
b.customer_id = a.customer_id
AND b.date BETWEEN DATE_SUB(a.report_date, 2) AND DATE_ADD(a.report_date, 2)
