Convert array of elements to multiple columns-CodePudding

How I can convert a array (in a column) with a set of elements in a JSON dataset to multiple columns with python, spark or pandas? The data is structured in this form:

root
 |-- items: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- idAccount: long (nullable = true)
 |    |    |-- infractionType: string (nullable = true)
 |    |    |-- responseTime: string (nullable = true)
 |    |    |-- status: string (nullable = true)
 |    |    |-- transactionCode: string (nullable = true)

I'm expecting some kind of this:

id	idAccount
value	value
value	value

CodePudding user response：

In Spark SQL, you can access the item in ArrayType or MapType column by using getItem. For example, you want to get the value of the id of first item, you can use df.select(func.getItem(0).getItem('id'))

CodePudding user response：

an array of structs can be exploded into columns using the inline sql function.

here's an example of how it works.

data_sdf = spark.createDataFrame([([(1234, 2345, 3456), (4321, 5432, 6543)],)], 
                                 'items array<struct<id: int, id_acc: int, foo: int>>'
                                 )

#  ---------------------------------------- 
# |items                                   |
#  ---------------------------------------- 
# |[{1234, 2345, 3456}, {4321, 5432, 6543}]|
#  ---------------------------------------- 

# root
#  |-- items: array (nullable = true)
#  |    |-- element: struct (containsNull = true)
#  |    |    |-- id: integer (nullable = true)
#  |    |    |-- id_acc: integer (nullable = true)
#  |    |    |-- foo: integer (nullable = true)

# explode and create new columns using struct fields - using `inline`
data_sdf. \
    selectExpr('inline(items)'). \
    show()

#  ---- ------ ---- 
# |  id|id_acc| foo|
#  ---- ------ ---- 
# |1234|  2345|3456|
# |4321|  5432|6543|
#  ---- ------ ----

you can further just select() the required fields after the explosion.