Home > Back-end >  Move query from SQL to SparkSQL
Move query from SQL to SparkSQL

Time:02-03

I'm trying to migrate a SQL query to a SparkSQL one. My main issue is with string_split function which I'm trying to replace with explode combined with split. Still giving me some errors..

The original query looks like this:

select *
from t1
where exists (
    select *
    from t2
    where exists (
        select *
        from string_split(t2.concat_id, ',')
        where t1.id = value
    ) and exists (
        select *
        from string_split(t2.concat_name, ',')
        where t1.name = value
    )
)

I moved it to something like this:

SELECT *
FROM t1
WHERE exists ( 
     SELECT * FROM t2 
     WHERE exists ( SELECT * FROM (SELECT explode(split(t2.concat_id,',')) as value)
        WHERE t1.id = value
    ) AND exists (
        SELECT * FROM (SELECT explode(split(t2.concat_name,',')) as value)
        WHERE t1.name = value
    )
)

My current errors is saying:

AnalysisException: cannot resolve 't1.id' given input columns: [__auto_generated_subquery_name.value];

Table1 Example:

| ID   | NAME   |
| RO1  | Arnold |
| RO2  | Ed     |
| RO3  | Sal    |
| RO4  | Teus   |
| RO11 | Haus  |

Table2 Example:

| Concat_ID  | Concat_Name |
| RO3,RO4    | Teus,Port   |
| RO10, RO15 | Rar,Tar     |
| RO2, RO6   | Ed, Kev     |

CodePudding user response:

You can use exists function to check the values directly in array without having to explode it:

SELECT *
FROM t1
WHERE exists ( 
     SELECT * FROM t2 
     WHERE exists(split(t2.concat_id, ','), x -> x = t1.id) 
     AND exists(split(t2.concat_name, ','), x -> x = t1.name)
)

The other way is using LATERAL VIEW if you explode the splited string:

SELECT *
FROM t1
WHERE exists ( 
     SELECT * FROM t2 
     LATERAL VIEW EXPLODE(split(t2.concat_id, ',')) AS c_id
     LATERAL VIEW EXPLODE(split(t2.concat_name,',')) AS c_name
     WHERE  t1.id = c_id AND t1.name = c_name
)

CodePudding user response:

Create the data:

create table t2 (ids string, names string);
insert into t2 values ("john,smith", "1,2");
insert into t2 values ("jane,doe", "3,4");

create table t1 (id string, name string);
insert into t1 values ("john", "1");

explode the columns into a cartesian product of rows and join t1. (Lateral view helps to change columns into rows and is the magic in the sauce here)

with t_two as 
   (
    select * from t2 
    LATERAL VIEW explode(split(t2.ids,',')) as my_ids 
    LATERAL VIEW explode(split(t2.names,',')) as my_names
   )
select t1.* 
from 
  t1 join t_two 
  on 
    t_two.my_names = t1.name 
  and 
    t_two.my_ids = t1.id ;
  •  Tags:  
  • Related