Home > Back-end >  Change string values with new values contain in another data frame
Change string values with new values contain in another data frame

Time:01-07

I have a csv with thousand of rows with sales data as follows:

pd.DataFrame({
    'Item_name': ['guacamole', 'morita', 'verde', 'pico', 'tomatillo'],
    'Inv_number': ['0001', '0002', '0003', '0004', '0005'],
    'Store_name': ['alex', 'pusateris', 'wholefoods','longos', 'metro']

Now the item names have changed to the following:

pd.DataFrame ({
'Item_name': ['Dip guacamole', 'morita Spicy', ' Salsa verde', 'Pico de Gallo', 'Roasted tomatillo']

What I am looking to achieve is to change the old name to the new. I am using the following code for each item, but this is going to take forever!

sales_df['item_code']= sales_df['item_code'].replace({'Guacamole':'Dip Guacamole'}) 

Is there a way to simplify this code? Maybe create a list with the new names and iterate through the sales data?

Looking forward to hearing your comments.

Thank you!

CodePudding user response:

Use fuzzy logic here.

# Python env: pip install thefuzz
# Anaconda env: conda install thefuzz

from thefuzz import process

THRESHOLD = 90  # reject all values below this score (%)

# df: your original dataframe
# df1: your new names
df['Item_name_new'] = \
    df['Item_name'].apply(lambda x: process.extractOne(x, df1['Item_name'],
                              score_cutoff=THRESHOLD)).str[0]
print(df)

# Output
   Item_name Inv_number  Store_name      Item_name_new
0  guacamole       0001        alex      Dip guacamole
1     morita       0002   pusateris       morita Spicy
2      verde       0003  wholefoods        Salsa verde
3       pico       0004      longos      Pico de Gallo
4  tomatillo       0005       metro  Roasted tomatillo
5      water       0006      nature               None

CodePudding user response:

You can use the replace function:

dic = {'Guacamole':'Dip Guacamole', 'morita': 'morita Spicy'}
sales_df = sales_df.replace({"item_code": dic})

CodePudding user response:

If the inventory number stayed the same, you should use it as an index. I would try creating a map between the index and the name and applying it on the old table:

name_dict = new_df.set_index("Inv_number")["Item_name'"].drop_duplicates()
old_df["new_names"] = old_df["Inv_number"].map(name_dict)
  •  Tags:  
  • Related