I have two tables in Hive (arch and noarch) with the following structure:
| Table1Arch | Table2NoArch |
|---|---|
| tr_id | tr_id |
| res_id | res_id |
| info_json | info_json |
| created_at | |
| updated_at |
I need to get a full data view arch noarch and join them by res_id.
I tried to do different variations of left joins but I was getting either results from arch and nothing from noarch or vice versa. I guess I should have used union all, but struggling to write it correctly.
Could you please help me with the right query?
EDIT:
I'd like to get kinda united view for these two tables when I search for particular res_ids.
Say I have the following data:
| Entry1Arch | Entry2Arch |
|---|---|
| 1 | 2 |
| 111 | 222 |
| {"something 1"} | {"something 2"} |
| Entry3NoArch | Entry4NoArch |
|---|---|
| 3 | 4 |
| 333 | 444 |
| {"something 3"} | {"something 4"} |
| 2021-10-03 21:01:44.0 | 2021-10-04 21:02:43.0 |
| 2021-10-03 21:01:44.0 | 2021-10-04 21:02:43.0 |
The ultimate goal is to get full data from both tables: 111 222 333 444).
CodePudding user response:
You can use UNION ALL:
select tr_id, res_id, info_json, created_at, updated_at, src
from
(select tr_id, res_id, info_json, created_at, updated_at, 'NoArch' as src
from Table2NoArch
union all
select tr_id, res_id, info_json, null created_at, null updated_at, 'Arch' as src
from Table1Arch
)u
where res_id in (111,333,444)
created_at and updated_at are absent in one Table1Arch, NULLs are selected, you can use current_timestamp or current_date instead.
Added src column, so you can easily find out the source of data.
