Select a group based on a ranking of column values in python-CodePudding

I have a dataframe looking like this:

Store	Prod.	value	type
US	WR	12	D
US	WR	23	D
US	WR	12	Y
US	TX	44	D
US	TX	55	W
MEX	WR	54	D
MEX	WR	3	Y
MEX	WR	4	Y
MEX	WF	135	D
MEX	WF	144	D
...	...	...	...

My goal is to split the dataframe into 3 dataframes based on the "type" column. The rule should be: If the group ("Store" and "Prod." combination) has at least one type "Y" in it it should go into dataframe 'dfy'. If there is no Y but a W in the group it goes into dataframe 'dfw'. If there is only D in the group it goes into dataframe 'dfd'.

CodePudding user response：

You can define a labeler function and transform the type series of each group:

def labeler(s):
    if (s == 'D').all():
        return 'dfd'
    elif (s == 'Y').sum() > 0:
        return 'dfy'
    elif (s == 'W').sum() > 0:
        return 'dfw'
    return 'other'

df['dfname'] = df.groupby(['Store', 'Prod.'])['type'].transform(labeler)

print(df)

  Store Prod.  value type dfname
0    US    WR     12    D    dfy
1    US    WR     23    D    dfy
2    US    WR     12    Y    dfy
3    US    TX     44    D    dfw
4    US    TX     55    W    dfw
5   MEX    WR     54    D    dfy
6   MEX    WR      3    Y    dfy
7   MEX    WR      4    Y    dfy
8   MEX    WF    135    D    dfd
9   MEX    WF    144    D    dfd

Then you can create the three desired DataFrames with:

dfy = df[df['dfname'] == 'dfy'].reset_index(drop=True)
dfw = df[df['dfname'] == 'dfw'].reset_index(drop=True)
dfd = df[df['dfname'] == 'dfd'].reset_index(drop=True)

Store	Prod.	value	type
US	WR	12	D
US	WR	23	D
US	WR	12	Y
US	TX	44	D
US	TX	55	W
MEX	WR	54	D
MEX	WR	3	Y
MEX	WR	4	Y
MEX	WF	135	D
MEX	WF	144	D
...	...	...	...

Store	Prod.	value	type
US	WR	12	D
US	WR	23	D
US	WR	12	Y
US	TX	44	D
US	TX	55	W
MEX	WR	54	D
MEX	WR	3	Y
MEX	WR	4	Y
MEX	WF	135	D
MEX	WF	144	D
...	...	...	...

Store	Prod.	value	type
US	WR	12	D
US	WR	23	D
US	WR	12	Y
US	TX	44	D
US	TX	55	W
MEX	WR	54	D
MEX	WR	3	Y
MEX	WR	4	Y
MEX	WF	135	D
MEX	WF	144	D
...	...	...	...