Home > Mobile >  spark date_format results showing null
spark date_format results showing null

Time:01-08

I have data source like below:

order_id,order_date,order_customer_id,order_status
1,2013-07-25 00:00:00.0,11599,CLOSED
2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT
3,2013-07-25 00:00:00.0,12111,COMPLETE
4,2013-07-25 00:00:00.0,8827,CLOSED

I am trying to convert to mm/dd/yyyy only for CLOSED orders using the below queries but getting output as null. can you please assist to get required date format using DSL or spark sql method:

closed_df=ord_df.select(date_format(to_date('order_date','yyyy-mm-dd hh:mm:SS.a'),'mm/dd/yyyy') .\
                 alias("formate_date")).show()

#output:

|formate_date|
 ------------ 
|        null|
|        null|

ord_df.createOrReplaceTempView("orders")
cld_df = spark.sql( """select order_id, date_format(to_date("order_date","yyyy-mm-dd hh:mm:ss.a"),'mm/dd/yyyy') as order_date,\
                     order_customer_id, order_status \
                     from orders where order_status = 'CLOSED'""").show()

#output:

|order_id|order_date|order_customer_id|order_status|
 -------- ---------- ----------------- ------------ 
|       1|      null|            11599|      CLOSED|
|       4|      null|             8827|      CLOSED

CodePudding user response:

The format to date from the string 2013-07-25 00:00:00.0 is yyyy-MM-dd HH:mm:SS.s. Likewise for date formatting the format is MM/dd/yyyy. Here the Spark formatting doc for more information.


data = [(1, "2013-07-25 00:00:00.0", 11599, "CLOSED",),
        (2, "2013-07-25 00:00:00.0", 256, "PENDING_PAYMENT",),
        (3, "2013-07-25 00:00:00.0", 12111, "COMPLETE",),
        (4, "2013-07-25 00:00:00.0", 8827, "CLOSED",), ]

ord_df = spark.createDataFrame(data, ("order_id", "order_date", "order_customer_id", "order_status",))


from pyspark.sql.functions import to_date, date_format
closed_df = (ord_df.where("order_status = 'CLOSED'")
                   .select(date_format(to_date('order_date','yyyy-MM-dd HH:mm:SS.s'),'MM/dd/yyyy')
                      .alias("formate_date"))).show()

"""
 ------------ 
|formate_date|
 ------------ 
|  07/25/2013|
|  07/25/2013|
 ------------ 
"""

ord_df.createOrReplaceTempView("orders")
cld_df = spark.sql( """select order_id, date_format(to_date(order_date,"yyyy-MM-dd HH:mm:SS.s"), "MM/dd/yyyy") as order_date,\
                     order_customer_id, order_status \
                     from orders where order_status = 'CLOSED'""").show()

"""
 -------- ---------- ----------------- ------------ 
|order_id|order_date|order_customer_id|order_status|
 -------- ---------- ----------------- ------------ 
|       1|07/25/2013|            11599|      CLOSED|
|       4|07/25/2013|             8827|      CLOSED|
 -------- ---------- ----------------- ------------ 
"""
  •  Tags:  
  • Related