I am working on this dataset having two columns(date->date, sales_%->string) -
| date | sales_% |
|---|---|
| 20/12/2021 | 50% |
| 21/12/2021 | 29% |
| 22/12/2021 | 60% |
| 23/12/2021 | 12% |
| 24/12/2021 | 75% |
| 25/12/2021 | 28% |
I want to convert the sales_% column to a decimal value(by dividing the % by 100).
The desired output should look like -
| date | sales |
|---|---|
| 20/12/2021 | 0.5 |
| 21/12/2021 | 0.29 |
| 22/12/2021 | 0.6 |
| 23/12/2021 | 0.12 |
| 24/12/2021 | 0.75 |
| 25/12/2021 | 0.28 |
This is what i have tried -
spark.sql("""select date, cast(cast(replace("sales_%",'%','') as integer)/100) as decimal(2,2)) from table_name""")
I am getting nulls. Let me know if anyone has a better idea.
CodePudding user response:
You are referencing column name as "sales_%" which is interpreted as literal string by Spark. You need to use back-ticks instead of quotes. Also, there is no need cast to integer before division by 100. Try this:
spark.sql("""
select date,
cast(replace(`sales_%`,'%','')/100 as decimal(2,2)) as sales
from table_name
""").show()
# ---------- -----
#| date|sales|
# ---------- -----
#|20/12/2021| 0.50|
#|21/12/2021| 0.29|
#|22/12/2021| 0.60|
#|23/12/2021| 0.12|
#|24/12/2021| 0.75|
#|25/12/2021| 0.28|
# ---------- -----
