I have two spark dataframes with different values that I would like to concatenate:
df:
c1 c2
A D
B E
B F
df2:
A B
key1 4
key2 5
key3 6
I would like to concatenate the unique values for certain columns in these dataframes into a single dataframe. Thus, the output would be
res:
values origin
A first
B first
key1 second
key2 second
key3 second
CodePudding user response:
Simple union should do the job:
import pyspark.sql.functions as F
df1 = df1.selectExpr("c1 as value").distinct().withColumn("origin", F.lit("first"))
df2 = df2.selectExpr("A as value").distinct().withColumn("origin", F.lit("second"))
res = df1.union(df2)
