How to trim zeros after decimal value without changing the datatype using PySpark?-CodePudding

My requirement is to remove trailing zeros from decimal value, I have tried regex and strip() to remove trailing zeros it worked but we use regex and strip for string datatype, I want Col_2 to be decimal without changing the precision and scale. Can someone please suggest alternative.

Sample Table :

Col_1 : string

Col_2 : decimal(18,2)

Col_1	Col_2
ABC	2.00
DEF	2.50
XMN	0.00
RST	1.28

Actual Result using regex:

Col_1 : string

Col_2 : string

Col_1	Col_2
ABC	2
DEF	2.5
XMN
RST	1.28

Expected Result :

Col_1	Col_2
ABC	2
DEF	2.5
XMN	0
RST	1.28

CodePudding user response：

decimal(18,2) type will always store those 2 digits after the comma. Displaying the trailing zeros on the right side of the comma is just a matter of formatting. So, if you want Col_2 to be in decimal and preserve the precision then store it as decimal(18,2) and format it as you want when displaying the data.

You can use for that format_number function:

import pyspark.sql.functions as F

df.withColumn("Col_2", F.expr("format_number(Col_2, '0.##')")).show()

# ----- ----- 
#|Col_1|Col_2|
# ----- ----- 
#|  ABC|    2|
#|  DEF|  2.5|
#|  XMN|    0|
#|  RST| 1.28|
# ----- -----