I'm trying to migrate a SQL query to a SparkSQL one. My main issue is with string_split function which I'm trying to replace with explode combined with split. Still giving me some errors..
The original query looks like this:
select *
from t1
where exists (
select *
from t2
where exists (
select *
from string_split(t2.concat_id, ',')
where t1.id = value
) and exists (
select *
from string_split(t2.concat_name, ',')
where t1.name = value
)
)
I moved it to something like this:
SELECT *
FROM t1
WHERE exists (
SELECT * FROM t2
WHERE exists ( SELECT * FROM (SELECT explode(split(t2.concat_id,',')) as value)
WHERE t1.id = value
) AND exists (
SELECT * FROM (SELECT explode(split(t2.concat_name,',')) as value)
WHERE t1.name = value
)
)
My current errors is saying:
AnalysisException: cannot resolve 't1.id' given input columns: [__auto_generated_subquery_name.value];
Table1 Example:
| ID | NAME |
| RO1 | Arnold |
| RO2 | Ed |
| RO3 | Sal |
| RO4 | Teus |
| RO11 | Haus |
Table2 Example:
| Concat_ID | Concat_Name |
| RO3,RO4 | Teus,Port |
| RO10, RO15 | Rar,Tar |
| RO2, RO6 | Ed, Kev |
CodePudding user response:
You can use exists function to check the values directly in array without having to explode it:
SELECT *
FROM t1
WHERE exists (
SELECT * FROM t2
WHERE exists(split(t2.concat_id, ','), x -> x = t1.id)
AND exists(split(t2.concat_name, ','), x -> x = t1.name)
)
The other way is using LATERAL VIEW if you explode the splited string:
SELECT *
FROM t1
WHERE exists (
SELECT * FROM t2
LATERAL VIEW EXPLODE(split(t2.concat_id, ',')) AS c_id
LATERAL VIEW EXPLODE(split(t2.concat_name,',')) AS c_name
WHERE t1.id = c_id AND t1.name = c_name
)
CodePudding user response:
Create the data:
create table t2 (ids string, names string);
insert into t2 values ("john,smith", "1,2");
insert into t2 values ("jane,doe", "3,4");
create table t1 (id string, name string);
insert into t1 values ("john", "1");
explode the columns into a cartesian product of rows and join t1. (Lateral view helps to change columns into rows and is the magic in the sauce here)
with t_two as
(
select * from t2
LATERAL VIEW explode(split(t2.ids,',')) as my_ids
LATERAL VIEW explode(split(t2.names,',')) as my_names
)
select t1.*
from
t1 join t_two
on
t_two.my_names = t1.name
and
t_two.my_ids = t1.id ;
