Create separate columns for key, value pairs contained in two columns of Spark Dataframe in Scala-CodePudding

I am new to Spark/Scala and have been struggling with this problem. So far, I have looked into similar questions involving explode and split, but have had no luck so far.

Here is an example input Dataframe:

id	attr_name	attr_value
0	name	James
0	hair_color	black
1	name	George
1	hair_color	black
2	name	Jack
2	hair_color	white
2	eye_color	blue

And here is an example of the output I am looking for:

id	name	hair_color	eye_color
0	James	black
1	George	black
2	Jack	white	blue

Any help would be appreciated here, thanks!

CodePudding user response：

I believe you're looking for pivot. Your example becomes something like:

df.groupBy($"id")
  .pivot($"attr_name", Seq("hair_color", "eye_color"))
  .agg(first($"attr_value"))

Explicitly spelling out the values in the attr_name column will give you a decent performance improvement. I have to admit I'm not sure whether the agg is necessary, given that you have one element in each group.