How to check if a value in a column is found in a list in a column, with Spark SQL?-CodePudding

I have a delta table A as shown below.

point	cluster	points_in_cluster
37	1	[37,32]
45	2	[45,67,84]
67	2	[45,67,84]
84	2	[45,67,84]
32	1	[37,32]

Also I have a table B as shown below.

id	point
101	37
102	67
103	84

I want a query like the following. Here in obviously doesn't work for a list. So, what would be the right syntax?

select b.id, a.point
from A a, B b
where b.point in a.points_in_cluster

As a result I should have a table like the following

id	point
101	37
101	32
102	45
102	67
102	84
103	45
103	67
103	84

CodePudding user response：

Based on your data sample, I'd do an equi-join on point column and then an explode on points_in_cluster :

from pyspark.sql import functions as F

# assuming A is df_A and B is df_B

df_A.join(
    df_B,
    on="point"
).select(
    "id",
    F.explode("points_in_cluster").alias("point")
)

Otherwise, you use array_contains:

select b.id, a.point
from A a, B b
where array_contains(a.points_in_cluster, b.point)