Suppose I have this data
| ColumnA | ColumnB |
|---|---|
| row1 | valueA |
| row1 | valueB |
| row2 | valueB |
How can I join the value of Column B that has the same value in Column A? Example:
| ColumnA | ColumnB |
|---|---|
| row1 | valueA, valueB |
| row2 | valueB |
CodePudding user response:
You can use collect_set and concat_ws.
df.select("ColumnA","ColumnB")
.groupBy("ColumnA")
.agg(concat_ws(",",collect_set("ColumnB")))
