I have a dataframe looking like this:
| Store | Prod. | value | type |
|---|---|---|---|
| US | WR | 12 | D |
| US | WR | 23 | D |
| US | WR | 12 | Y |
| US | TX | 44 | D |
| US | TX | 55 | W |
| MEX | WR | 54 | D |
| MEX | WR | 3 | Y |
| MEX | WR | 4 | Y |
| MEX | WF | 135 | D |
| MEX | WF | 144 | D |
| ... | ... | ... | ... |
My goal is to split the dataframe into 3 dataframes based on the "type" column. The rule should be: If the group ("Store" and "Prod." combination) has at least one type "Y" in it it should go into dataframe 'dfy'. If there is no Y but a W in the group it goes into dataframe 'dfw'. If there is only D in the group it goes into dataframe 'dfd'.
CodePudding user response:
You can define a labeler function and transform the type series of each group:
def labeler(s):
if (s == 'D').all():
return 'dfd'
elif (s == 'Y').sum() > 0:
return 'dfy'
elif (s == 'W').sum() > 0:
return 'dfw'
return 'other'
df['dfname'] = df.groupby(['Store', 'Prod.'])['type'].transform(labeler)
print(df)
Store Prod. value type dfname
0 US WR 12 D dfy
1 US WR 23 D dfy
2 US WR 12 Y dfy
3 US TX 44 D dfw
4 US TX 55 W dfw
5 MEX WR 54 D dfy
6 MEX WR 3 Y dfy
7 MEX WR 4 Y dfy
8 MEX WF 135 D dfd
9 MEX WF 144 D dfd
Then you can create the three desired DataFrames with:
dfy = df[df['dfname'] == 'dfy'].reset_index(drop=True)
dfw = df[df['dfname'] == 'dfw'].reset_index(drop=True)
dfd = df[df['dfname'] == 'dfd'].reset_index(drop=True)
