Home > Back-end >  replace key value from dictionary
replace key value from dictionary

Time:01-14

Below is my DF:

deviceDict = {'TABLET' : 'MOBILE', 'PHONE':'MOBILE', 'PC':'Desktop', 'CEDEX' : '', 'ST' : 'SAINT', 'AV' : 'AVENUE', 'BD': 'BOULEVARD'}
df = spark.createDataFrame([('TABLET', 'DAF ST PAQ BD'), ('PHONE', 'AVOTHA'),  ('PC', 'STPA CEDEX'), ('OTHER', 'AV DAF'), (None, None)], ["device_type", 'City'])
df.show()

Output:

 ----------- ------------- 
|device_type|         City|
 ----------- ------------- 
|     TABLET|DAF ST PAQ BD|
|      PHONE|       AVOTHA|
|         PC|   STPA CEDEX|
|      OTHER|       AV DAF|
|       null|         null|
 ----------- ------------- 

The aim is to replace key/value, solution from Pyspark: Replacing value in a column by searching a dictionary

tests = df.na.replace(deviceDict, 1)

Result:

 ----------- ------------- 
|device_type|         City|
 ----------- ------------- 
|     MOBILE|DAF ST PAQ BD|
|     MOBILE|       AVOTHA|
|    Desktop|   STPA CEDEX|
|      OTHER|       AV DAF|
|       null|         null|
 ----------- ------------- 

It worked for device_type but I wasn't able to change the city (even when using subset)

Expected output:

 ----------- ------------------------ 
|device_type|                    City|
 ----------- ------------------------ 
|     MOBILE| DAF SAINT PAQ BOULEVARD|
|     MOBILE|                  AVOTHA|
|    Desktop|                    STPA|
|      OTHER|              AVENUE DAF|
|       null|                    null|
 ----------- ------------------------ 

CodePudding user response:

The replacement doesn't occur for the column City because you're trying to do some partial replacement in the column values. Whereas function DataFrame.replace uses the entire value as a mapping.

To achieve what you want for column City, you can use multiple nested regexp_replace expressions that you can dynamically generate using Python functools.reduce for example:

from functools import reduce
import pyspark.sql.functions as F

m = list(deviceDict.items())

df1 = df.na.replace(deviceDict, 1).withColumn(
    "City",
    reduce(
        lambda acc, x: F.regexp_replace(acc, rf"\b{x[0]}\b", x[1]),
        m[1:],
        F.regexp_replace(F.col("City"), rf"\b{m[0][0]}\b", m[0][1]),
    )
)

df1.show(truncate=False)
# ----------- ----------------------- 
#|device_type|City                   |
# ----------- ----------------------- 
#|MOBILE     |DAF SAINT PAQ BOULEVARD|
#|MOBILE     |AVOTHA                 |
#|Desktop    |STPA                   |
#|OTHER      |AVENUE DAF             |
#|null       |null                   |
# ----------- ----------------------- 

CodePudding user response:

I am using pandas dataframe and I think it will not be very different.

You will need to use "to_replace" option with a regex

# this will replace the old with new values given in deviceDict only if its a full string match
>>> print(deviceDict)

{'TABLET': 'MOBILE', 
'PHONE': 'MOBILE', 
'PC': 'Desktop', 
'CEDEX': '', 
'ST': 'SAINT', 
'AV': 'AVENUE', 
'BD': 'BOULEVARD'}

>>> print(df.replace(to_replace=deviceDict))

device_type           City
0      MOBILE  DAF ST PAQ BD
1      MOBILE         AVOTHA
2     Desktop     STPA CEDEX
3       OTHER         AV DAF
4        None           None

>>> df.replace(to_replace=r'\s*ST\s ', value=' SAINT ', regex = True)

device_type              City
0      TABLET  DAF SAINT PAQ BD
1       PHONE            AVOTHA
2          PC        STPA CEDEX
3       OTHER            AV DAF
4        None              None


>>> print(df.replace(to_replace=r'\s*AV\s ', value=' AVENUE ', regex = True))

  device_type           City
0      TABLET  DAF ST PAQ BD
1       PHONE         AVOTHA
2          PC     STPA CEDEX
3       OTHER     AVENUE DAF
4        None           None

The more generic solution will be

But I dont think you want to change AVOTHA --> AVENUEOTHA or, STPA --> SAINTPA

If you are fine with that change then this will work

>>> print(df.replace(to_replace=deviceDict, regex=True))

device_type                     City
0      MOBILE  DAF SAINT PAQ BOULEVARD
1      MOBILE               AVENUEOTHA
2     Desktop                 SAINTPA 
3       OTHER               AVENUE DAF
4        None                     None
  •  Tags:  
  • Related