Home > Software engineering >  Replacing numerical value with certain number in pyspark
Replacing numerical value with certain number in pyspark

Time:02-03

Here

 -------- ------------- 
| subs_no|airport_score|
 -------- ------------- 
|10385193|         1.85|
|10003076|       138.75|
|10100559|       382.95|
|10867116|         37.0|
|10164103|        12.95|
|10090458|         25.9|
|11049702|        12.95|
|10128459|          7.4|
|10064536|         5.55|
|10153463|         51.8|
|10040542|          3.7|
|10108980|         51.8|
|10003439|         14.8|
|10003375|          7.4|
|10012363|         29.6|
|10009808|         11.1|
|10001949|         1.85|
|10031025|        49.95|
|11020659|          3.7|
|10050972|         44.4|
 -------- ------------- 

Here's what I want, all score more than 100 become 100

 -------- ------------- 
| subs_no|airport_score|
 -------- ------------- 
|10385193|         1.85|
|10003076|          100|
|10100559|          100|
|10867116|         37.0|
|10164103|        12.95|
|10090458|         25.9|
|11049702|        12.95|
|10128459|          7.4|
|10064536|         5.55|
|10153463|         51.8|
|10040542|          3.7|
|10108980|         51.8|
|10003439|         14.8|
|10003375|          7.4|
|10012363|         29.6|
|10009808|         11.1|
|10001949|         1.85|
|10031025|        49.95|
|11020659|          3.7|
|10050972|         44.4|
 -------- ------------- 

CodePudding user response:

You can easily do this with a when-otherwise statement

Data Preparation

df = pd.DataFrame({
        'airport_score':[i for i in range(0,200,10)],    
})

sparkDF = sql.createDataFrame(df)

sparkDF.show()

 ------------- 
|airport_score|
 ------------- 
|            0|
|           10|
|           20|
|           30|
|           40|
|           50|
|           60|
|           70|
|           80|
|           90|
|          100|
|          110|
|          120|
|          130|
|          140|
|          150|
|          160|
|          170|
|          180|
|          190|
 ------------- 

Case When

sparkDF = sparkDF.withColumn('airport_score'
                             ,F.when(F.col('airport_score') >= 100,100
                                        ).otherwise(F.col('airport_score'))
                            )

sparkDF.show()

 ------------- 
|airport_score|
 ------------- 
|            0|
|           10|
|           20|
|           30|
|           40|
|           50|
|           60|
|           70|
|           80|
|           90|
|          100|
|          100|
|          100|
|          100|
|          100|
|          100|
|          100|
|          100|
|          100|
|          100|
 ------------- 
  •  Tags:  
  • Related