Concatenation of unique values into a spark dataframe-CodePudding

I have two spark dataframes with different values that I would like to concatenate:

df:

c1    c2
A     D
B     E
B     F

df2:

A    B
key1 4
key2 5
key3 6

I would like to concatenate the unique values for certain columns in these dataframes into a single dataframe. Thus, the output would be

res:

values      origin
A           first
B           first
key1        second
key2        second
key3        second

CodePudding user response：

Simple union should do the job:

import pyspark.sql.functions as F

df1 = df1.selectExpr("c1 as value").distinct().withColumn("origin", F.lit("first"))

df2 = df2.selectExpr("A as value").distinct().withColumn("origin", F.lit("second"))

res = df1.union(df2)