Home > Software engineering >  Exclude column from json string?
Exclude column from json string?

Time:01-13

I have a field in dataframe that contains json. Is there a way to exclude some column from json?

Input:

{"column1":"data", "column2":"data"}

Expected output:

{"column1":"data"}

CodePudding user response:

You can convert the json into MapType with from_json function, filter the map to exclude the columns you want, then convert back to json using to_json:

import pyspark.sql.functions as F

df = spark.createDataFrame([('{"column1":"data", "column2":"data"}',)], ["json_col"])

cols_to_exclude = ["column2"]

df1 = df.withColumn(
    "json_col",
    F.from_json("json_col", "map<string,string>")
).withColumn(
    "json_col",
    F.to_json(
        F.map_filter("json_col", lambda k, v: ~k.isin(cols_to_exclude))
    )
)

df1.show()
# ------------------ 
#|          json_col|
# ------------------ 
#|{"column1":"data"}|
# ------------------ 
  •  Tags:  
  • Related