Home > OS >  How to dynamically create pyspark code from config file or array?
How to dynamically create pyspark code from config file or array?

Time:02-01

Does anyone if there is a way to dynamically build a pyspark command from config input?

I'm trying to end up with a command that ends up something like;

newDF = df.select('*', when(df.a == 1,'1').when(df.a == 2, '2').alias('new_col') )

The when expressions are variable in number, and content. So could be something like;

conditions = 
[
"when(df.a == 'something')",
"when(df.a >= 'something else')"
]

I can design the structure of the conditions, so that part is to be decided. It's more how I could build a command using this approach, without spark thinking I am trying to pass it a string. My first attempt looked like;

command = ".when(df.a == '1', 'something').when(df.a == '2', 'somethingelse')"

newDF = df.select("*", command)

However, the error I get, is Spark doesn't like that I'm passing a string.

Any help appreciated!

CodePudding user response:

There are probably many ways to go but here are two options you can consider according to the examples you gave in your question:

Using string expressions

You can have a list of tuples with column names to create and the corresponding SQL expressions which you pass to F.expr function like this:

from pyspark.sql import functions as F

new_cols = [
    ("new_col", "case when a = 1 then 'something' when a = 2 then 'somethingelse' end"),
    ("new_col2", "case when a = 1 then true when a = 2 then false end")
]

df.select("*", *[F.expr(x[1]).alias(x[0]) for x in new_cols])
Dynamically construct when expression

You can define a list of case/when conditions for a column to create then use python functools.reduce to build when expression like this:

from functools import reduce
from pyspark.sql import functions as F

conditions = [
    ('1', 'something'),
    ('2', 'somethingelse')
]

new_col = reduce(lambda acc, x: acc.when(F.col("a") == x[0], x[1]), conditions, F)

df.select("*", new_col.alias("new_col"))
  •  Tags:  
  • Related