Create and cast a new column from existing column with % concatenation-CodePudding

I have a pyspark dataframe

Month       Location    Sales
1/1/2020    India       0.4799
1/1/2020    USA         0.08840739
2/1/2020    Australia   0.1957
2/1/2020    China       0.204

I need to create a new column Sales_new with % concatenated and retain only 2 digits after the decimal point from Sales column So my desired dataframe will be

Month       Location    Sales_new
1/1/2020    India       47.99%
1/1/2020    USA         8.84%
2/1/2020    Australia   19.57%
2/1/2020    China       20.4%

I tried with this code:

df1 = df.withColumn('Sales_new', F.round(F.concat(col('Sales')*100, F.lit("%")),1))

and I'm getting:

Month       Location    Sales_new
1/1/2020    India       null
1/1/2020    USA         null
2/1/2020    Australia   null
2/1/2020    China       null

Am I missing something? Any other approaches, please suggest

CodePudding user response：

You should execute round first, and then execute concat.

df1 = df.withColumn('Sales_new', F.concat(F.round(F.col('Sales')*100, 2), F.lit("%")))
df1.show()

CodePudding user response：

You can use format_number instead of rounding and concatenating:

import pyspark.sql.functions as F

df1 = df.withColumn('Sales_new', F.expr("format_number(Sales, '0.##%')"))

df1.show()
# -------- --------- ---------- --------- 
#|   Month| Location|     Sales|Sales_new|
# -------- --------- ---------- --------- 
#|1/1/2020|    India|    0.4799|   47.99%|
#|1/1/2020|      USA|0.08840739|    8.84%|
#|2/1/2020|Australia|    0.1957|   19.57%|
#|2/1/2020|    China|     0.204|    20.4%|
# -------- --------- ---------- ---------