I have data source like below:
order_id,order_date,order_customer_id,order_status
1,2013-07-25 00:00:00.0,11599,CLOSED
2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT
3,2013-07-25 00:00:00.0,12111,COMPLETE
4,2013-07-25 00:00:00.0,8827,CLOSED
I am trying to convert to mm/dd/yyyy only for CLOSED orders using the below queries but getting output as null. can you please assist to get required date format using DSL or spark sql method:
closed_df=ord_df.select(date_format(to_date('order_date','yyyy-mm-dd hh:mm:SS.a'),'mm/dd/yyyy') .\
alias("formate_date")).show()
#output:
|formate_date|
------------
| null|
| null|
ord_df.createOrReplaceTempView("orders")
cld_df = spark.sql( """select order_id, date_format(to_date("order_date","yyyy-mm-dd hh:mm:ss.a"),'mm/dd/yyyy') as order_date,\
order_customer_id, order_status \
from orders where order_status = 'CLOSED'""").show()
#output:
|order_id|order_date|order_customer_id|order_status|
-------- ---------- ----------------- ------------
| 1| null| 11599| CLOSED|
| 4| null| 8827| CLOSED
CodePudding user response:
The format to date from the string 2013-07-25 00:00:00.0 is yyyy-MM-dd HH:mm:SS.s. Likewise for date formatting the format is MM/dd/yyyy. Here the Spark formatting doc for more information.
data = [(1, "2013-07-25 00:00:00.0", 11599, "CLOSED",),
(2, "2013-07-25 00:00:00.0", 256, "PENDING_PAYMENT",),
(3, "2013-07-25 00:00:00.0", 12111, "COMPLETE",),
(4, "2013-07-25 00:00:00.0", 8827, "CLOSED",), ]
ord_df = spark.createDataFrame(data, ("order_id", "order_date", "order_customer_id", "order_status",))
from pyspark.sql.functions import to_date, date_format
closed_df = (ord_df.where("order_status = 'CLOSED'")
.select(date_format(to_date('order_date','yyyy-MM-dd HH:mm:SS.s'),'MM/dd/yyyy')
.alias("formate_date"))).show()
"""
------------
|formate_date|
------------
| 07/25/2013|
| 07/25/2013|
------------
"""
ord_df.createOrReplaceTempView("orders")
cld_df = spark.sql( """select order_id, date_format(to_date(order_date,"yyyy-MM-dd HH:mm:SS.s"), "MM/dd/yyyy") as order_date,\
order_customer_id, order_status \
from orders where order_status = 'CLOSED'""").show()
"""
-------- ---------- ----------------- ------------
|order_id|order_date|order_customer_id|order_status|
-------- ---------- ----------------- ------------
| 1|07/25/2013| 11599| CLOSED|
| 4|07/25/2013| 8827| CLOSED|
-------- ---------- ----------------- ------------
"""
