Flatten a string array field in a dataframe-CodePudding

I have a dataframe that looks like this:

I want to flatten the barcode field of type StringType, instead of looking like this where there is more than one barcode for the same item, I want it to repeat, for example:

 ------------------------------------------------------------ 
|suborder| updated_on | suborder_item_id| barcode            | 
 -------- ------------ ----------------- --------------------                          
|1       | 123        | 3333            |["123","456","789"] |
 -------- ------------ ----------------- --------------------

I want it to repeat if there's more than one barcode in the same dataframe:

 -------- ---------- ---------------- ------- 
|suborder|updated_on|suborder_item_id|barcode|
 -------- ---------- ---------------- ------- 
|       1|       123|            3333|    123|
|       1|       123|            3333|    456|
|       1|       123|            3333|    789|
 -------- ---------- ---------------- -------

Your help would be greatly appreciated, I've been struggling with this for a while now.

CodePudding user response：

You are looking for just the explode function.

import pyspark.sql.functions as F

……

df = df.select('suborder', 'updated_on', 'suborder_item_id', F.explode('barcode').alias('barcode'))

CodePudding user response：

Use from_json function to parse the stringified array into ArrayType then explode it:

import pyspark.sql.functions as F

df = spark.createDataFrame([
    (1, 123, 3333, '["123","456","789"]')
], ["suborder", "updated_on", "suborder_item_id", "barcode"])

df1 = df.withColumn(
    "barcode",
    F.explode(F.from_json("barcode", "array<string>"))
)

df1.show()
# -------- ---------- ---------------- ------- 
#|suborder|updated_on|suborder_item_id|barcode|
# -------- ---------- ---------------- ------- 
#|       1|       123|            3333|    123|
#|       1|       123|            3333|    456|
#|       1|       123|            3333|    789|
# -------- ---------- ---------------- -------