I am new to Spark/Scala and have been struggling with this problem. So far, I have looked into similar questions involving explode and split, but have had no luck so far.
Here is an example input Dataframe:
| id | attr_name | attr_value |
|---|---|---|
| 0 | name | James |
| 0 | hair_color | black |
| 1 | name | George |
| 1 | hair_color | black |
| 2 | name | Jack |
| 2 | hair_color | white |
| 2 | eye_color | blue |
And here is an example of the output I am looking for:
| id | name | hair_color | eye_color |
|---|---|---|---|
| 0 | James | black | |
| 1 | George | black | |
| 2 | Jack | white | blue |
Any help would be appreciated here, thanks!
CodePudding user response:
I believe you're looking for pivot. Your example becomes something like:
df.groupBy($"id")
.pivot($"attr_name", Seq("hair_color", "eye_color"))
.agg(first($"attr_value"))
Explicitly spelling out the values in the attr_name column will give you a decent performance improvement. I have to admit I'm not sure whether the agg is necessary, given that you have one element in each group.
