Home > database >  python pandas. Sum of a column based on a match of string column in a comma separated values of anot
python pandas. Sum of a column based on a match of string column in a comma separated values of anot

Time:01-30

I have 2 data frames sample, with names changed:

df1 =

Comp_code DepartmentListA DepartmentListB
Code_1 "Dept1" "Dept3"
Code_2 "Dept2" "Dept4"
Code_3 "Dept4, Dept5" "Dept1"
Code_4 "Dept1,Dept5, Dept6" "Dept3, Dept4"

df2 = with just department and revenue

DepartmentList Revenue Gross Margin
"Dept1" 1000 500
"Dept2" 2000 0
"Dept3, 5000 900
"Dept4" 5000 200
"Dept5" 7000 -100
"Dept6" 8000 2500

I would like my final df to have company codes and total revenue and gross margin. Summing up totals of departments in column A and B. I am not able to iterate and join because of the comma separated string of departments. My final DF should be something like this

expected df =

Comp_code GrossRev Tot Margin
Code_1 6000 1400
Code_2 7000 200
Code_3 13000 600
Code_4 26000 4000

Also the data frames are both a couple of million rows and some department lists (comma separated values) are about 100 in number. Would be good if there was a performant way of doing this.

CodePudding user response:

This code works. It's very long, but it's mostly just repetitive.

new_df = df1[['Comp_code']].copy()
new_df['GrossRev'] = df1['DepartmentListB'].str.split(',').explode().map(df2.set_index('DepartmentList')['Revenue']).groupby(level=0).sum()   df1['DepartmentListA'].str.split(',').explode().map(df2.set_index('DepartmentList')['Revenue']).groupby(level=0).sum()
new_df['Tot Margin'] = df1['DepartmentListB'].str.split(',').explode().map(df2.set_index('DepartmentList')['GrossMargin']).groupby(level=0).sum()   df1['DepartmentListA'].str.split(',').explode().map(df2.set_index('DepartmentList')['GrossMargin']).groupby(level=0).sum()

Output:

>>> new_df
  Comp_code  GrossRev  Tot Margin
0    Code_1      6000        1400
1    Code_2      7000         200
2    Code_3     13000         600
3    Code_4     26000        4000
  •  Tags:  
  • Related