Add an bool column based array of string column
CodePudding user response:
Use exists function for Spark 2.4 :
val df = customerDocument.withColumn(
"flag",
expr("exists(address, x -> x rlike 'test string')")
)
For older versions, you can convert the array into string then use rlike:
val df = customerDocument.withColumn(
"flag",
concat_ws(",", col("address")).rlike("test string")
)
Example:
val df = Seq(
(Seq("ADR249,IND0100,300", "Purcell Road", "Road Town", "British Virgin Islands,300,Purcell Road,Road Town,British Virgin Islands")),
(Seq("ADR500,IND0268,425", "High Street", "Sydney", "Australia,425,High Street,Sydney,Australia"))
).toDF("address")
df.withColumn(
"flag",
concat_ws(",", col("address")).rlike("British Virgin Islands")
).show(false)
// ----------------------------------------------------------------------------------------------------------------------- -----
//|address |flag |
// ----------------------------------------------------------------------------------------------------------------------- -----
//|[ADR249,IND0100,300, Purcell Road, Road Town, British Virgin Islands,300,Purcell Road,Road Town,British Virgin Islands]|true |
//|[ADR500,IND0268,425, High Street, Sydney, Australia,425,High Street,Sydney,Australia] |false|
// ----------------------------------------------------------------------------------------------------------------------- -----
EDIT
For your specific spark version (<2.1), you can't use concat_ws to convert array into string. You need to use DataFrame.map like this:
df.map(r => {
val address = r.getList(0).toArray.mkString(",")
(address)
}).toDF("address").withColumn(
"flag",
col("address").rlike("British Virgin Islands")
).show(false)
