Home > Enterprise >  How to read a string value in JSON array struct?
How to read a string value in JSON array struct?

Time:01-17

This is my code:

%spark.pyspark
    
df_05_body = spark.sql("""
     select 
    gtin
    , principalBody.constituents
 from 
v_df_04""")

df_05_body.createOrReplaceTempView("v_df_05_body")

df_05_body.printSchema()

This is the schema:

root
 |-- gtin: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- constituents: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- constituentCategory: struct (nullable = true)
 |    |    |    |    |-- value: string (nullable = true)
 |    |    |    |    |-- valueRange: string (nullable = true)

How to change the principalBody.constituents row in the SQL to read the fields constituentCategory.value and constituentCategory.valueRange?

CodePudding user response:

The column constituents is an array of arrays of structs. If your intent is to get a flat structure then you'll need to flatten the nested arrays, then explode:

df_05_body = spark.sql("""
    WITH
      v_df_04_exploded AS (
      SELECT
        gtin,
        explode(flatten(principalBody.constituents)) AS constituent
      FROM
        v_df_04 )

    SELECT
      gtin,
      constituent.constituentCategory.value,
      constituent.constituentCategory.valueRange
    FROM
      v_df_04_exploded
""")

Or simply using inline after flatten like this:

df_05_body = spark.sql("""
    SELECT
      gtin,
      inline(flatten(principalBody.constituents))
    FROM
      v_df_04_exploded
""")
  •  Tags:  
  • Related