Home > Enterprise >  Pandas sum corresponding values based on values in another column
Pandas sum corresponding values based on values in another column

Time:01-05

I have a table, df1, containing columns Itemlist1 and Itemlist2 where each cell in this table can contain any number of items starting from 1. Example: there can be 1 item A in one cell, 2 items B, C in another cell and 3 items A, D, E in another.

df1

I have another table, df2, with the Price and Cost of each item.

df2

I want to create this final df with 2 new columns added to df1, Totalprice and Totalcost. The Totalprice and Totalcost is the sum of all the items in each row of df1. Example: In the 2nd row, Totalprice is the sum of the price of items B, A, D and Totalcost is the sum of the cost of items B, A, D.

finaldf

I am thinking of combining all the items into a column in df1, splitting each item into a column and then merging it with df2. This is what I have so far.

df1["items"]=df1[["Itemlist1","Itemlist2"]].agg(', '.join,axis=1)

df3=df1['items'].str.split(', ',expand=True)

As the number of items in each row is not fixed and the columns names also do not match, how can I loop this to merge with df2?

Alternatively, is there a better approach to arrive at the final df I want? Any suggestions please. Thank you.

CodePudding user response:

From your df3, do the replace, then sum with axis=1

cost_dict = dict(zip(df2.Itemcode,df2.Cost))
price_dict = dict(zip(df2.Itemcode,df2.Price))
df1['totalcost'] = df3.replace(cost_dict).sum(axis=1)
df1['totalprice'] = df3.replace(price_dict).sum(axis=1)
  •  Tags:  
  • Related