I will convert col in minutes to hours : minutes
| col(min) |
|---|
| 685 |
I will obtain
| col(min) | col1(h:min) |
|---|---|
| 685 | 11:25 |
CodePudding user response:
Use the sql functions div and mod to get the quotient and remainder respectively, and then concatenate them.
df = df.withColumn('col1', F.expr('concat(div(col, 60), ":", mod(col, 60))'))
CodePudding user response:
You can use .map to transform data from an RDD into one or more columns.
Python builtin function divmod returns the quotient and remainder of an integer division. divmod(a, b) is equivalent to (a // b, a % b).
rdd = sc.parallelize([
685, 180, 80
])
results = rdd.map(lambda x: divmod(x, 60))
print( results.collect() )
# [(11, 25), (3, 0), (1, 20)]
Or if you want the result as strings in format hh:mm, use str.format to format the values to your liking:
results = rdd.map(lambda x: '{:02d}:{:02d}'.format(*divmod(x, 60)))
print( results.collect() )
# ['11:25', '03:00', '01:20']
If you want to keep both the number of minutes and the resulting hh:mm string:
results = rdd.map(lambda x: (x, '{:02d}:{:02d}'.format(*divmod(x, 60))))
print( results.collect() )
# [(685, '11:25'), (180, '03:00'), (80, '01:20')]
