python pandas. Sum of a column based on a match of string column in a comma separated values of anot-CodePudding

I have 2 data frames sample, with names changed:

df1 =

Comp_code	DepartmentListA	DepartmentListB
Code_1	"Dept1"	"Dept3"
Code_2	"Dept2"	"Dept4"
Code_3	"Dept4, Dept5"	"Dept1"
Code_4	"Dept1,Dept5, Dept6"	"Dept3, Dept4"

df2 = with just department and revenue

DepartmentList	Revenue	Gross Margin
"Dept1"	1000	500
"Dept2"	2000	0
"Dept3,	5000	900
"Dept4"	5000	200
"Dept5"	7000	-100
"Dept6"	8000	2500

I would like my final df to have company codes and total revenue and gross margin. Summing up totals of departments in column A and B. I am not able to iterate and join because of the comma separated string of departments. My final DF should be something like this

expected df =

Comp_code	GrossRev	Tot Margin
Code_1	6000	1400
Code_2	7000	200
Code_3	13000	600
Code_4	26000	4000

Also the data frames are both a couple of million rows and some department lists (comma separated values) are about 100 in number. Would be good if there was a performant way of doing this.

CodePudding user response：

This code works. It's very long, but it's mostly just repetitive.

new_df = df1[['Comp_code']].copy()
new_df['GrossRev'] = df1['DepartmentListB'].str.split(',').explode().map(df2.set_index('DepartmentList')['Revenue']).groupby(level=0).sum()   df1['DepartmentListA'].str.split(',').explode().map(df2.set_index('DepartmentList')['Revenue']).groupby(level=0).sum()
new_df['Tot Margin'] = df1['DepartmentListB'].str.split(',').explode().map(df2.set_index('DepartmentList')['GrossMargin']).groupby(level=0).sum()   df1['DepartmentListA'].str.split(',').explode().map(df2.set_index('DepartmentList')['GrossMargin']).groupby(level=0).sum()

Output:

>>> new_df
  Comp_code  GrossRev  Tot Margin
0    Code_1      6000        1400
1    Code_2      7000         200
2    Code_3     13000         600
3    Code_4     26000        4000