Home > Software design >  Adding a column in dataframe based on another column
Adding a column in dataframe based on another column

Time:01-29

Add an bool column based array of string column

CodePudding user response:

Use exists function for Spark 2.4 :

val df = customerDocument.withColumn(
  "flag",
  expr("exists(address, x -> x rlike 'test string')")
)

For older versions, you can convert the array into string then use rlike:

val df = customerDocument.withColumn(
  "flag",
  concat_ws(",", col("address")).rlike("test string")
)

Example:

val df = Seq(
  (Seq("ADR249,IND0100,300", "Purcell Road", "Road Town", "British Virgin Islands,300,Purcell Road,Road Town,British Virgin Islands")),
  (Seq("ADR500,IND0268,425", "High Street", "Sydney", "Australia,425,High Street,Sydney,Australia"))
).toDF("address")

df.withColumn(
  "flag", 
  concat_ws(",", col("address")).rlike("British Virgin Islands")
).show(false)

// ----------------------------------------------------------------------------------------------------------------------- ----- 
//|address                                                                                                                |flag |
// ----------------------------------------------------------------------------------------------------------------------- ----- 
//|[ADR249,IND0100,300, Purcell Road, Road Town, British Virgin Islands,300,Purcell Road,Road Town,British Virgin Islands]|true |
//|[ADR500,IND0268,425, High Street, Sydney, Australia,425,High Street,Sydney,Australia]                                  |false|
// ----------------------------------------------------------------------------------------------------------------------- ----- 

EDIT

For your specific spark version (<2.1), you can't use concat_ws to convert array into string. You need to use DataFrame.map like this:

df.map(r => {
  val address = r.getList(0).toArray.mkString(",")
  (address)
}).toDF("address").withColumn(
  "flag",
  col("address").rlike("British Virgin Islands")
).show(false)
  •  Tags:  
  • Related