I have a pyspark dataframe
Month Location Sales
1/1/2020 India 0.4799
1/1/2020 USA 0.08840739
2/1/2020 Australia 0.1957
2/1/2020 China 0.204
I need to create a new column Sales_new with % concatenated and retain only 2 digits after the decimal point from Sales column
So my desired dataframe will be
Month Location Sales_new
1/1/2020 India 47.99%
1/1/2020 USA 8.84%
2/1/2020 Australia 19.57%
2/1/2020 China 20.4%
I tried with this code:
df1 = df.withColumn('Sales_new', F.round(F.concat(col('Sales')*100, F.lit("%")),1))
and I'm getting:
Month Location Sales_new
1/1/2020 India null
1/1/2020 USA null
2/1/2020 Australia null
2/1/2020 China null
Am I missing something? Any other approaches, please suggest
CodePudding user response:
You should execute round first, and then execute concat.
df1 = df.withColumn('Sales_new', F.concat(F.round(F.col('Sales')*100, 2), F.lit("%")))
df1.show()
CodePudding user response:
You can use format_number instead of rounding and concatenating:
import pyspark.sql.functions as F
df1 = df.withColumn('Sales_new', F.expr("format_number(Sales, '0.##%')"))
df1.show()
# -------- --------- ---------- ---------
#| Month| Location| Sales|Sales_new|
# -------- --------- ---------- ---------
#|1/1/2020| India| 0.4799| 47.99%|
#|1/1/2020| USA|0.08840739| 8.84%|
#|2/1/2020|Australia| 0.1957| 19.57%|
#|2/1/2020| China| 0.204| 20.4%|
# -------- --------- ---------- ---------
