Home > Net >  Select a group based on a ranking of column values in python
Select a group based on a ranking of column values in python

Time:01-12

I have a dataframe looking like this:

Store Prod. value type
US WR 12 D
US WR 23 D
US WR 12 Y
US TX 44 D
US TX 55 W
MEX WR 54 D
MEX WR 3 Y
MEX WR 4 Y
MEX WF 135 D
MEX WF 144 D
... ... ... ...

My goal is to split the dataframe into 3 dataframes based on the "type" column. The rule should be: If the group ("Store" and "Prod." combination) has at least one type "Y" in it it should go into dataframe 'dfy'. If there is no Y but a W in the group it goes into dataframe 'dfw'. If there is only D in the group it goes into dataframe 'dfd'.

CodePudding user response:

You can define a labeler function and transform the type series of each group:

def labeler(s):
    if (s == 'D').all():
        return 'dfd'
    elif (s == 'Y').sum() > 0:
        return 'dfy'
    elif (s == 'W').sum() > 0:
        return 'dfw'
    return 'other'

df['dfname'] = df.groupby(['Store', 'Prod.'])['type'].transform(labeler)

print(df)

  Store Prod.  value type dfname
0    US    WR     12    D    dfy
1    US    WR     23    D    dfy
2    US    WR     12    Y    dfy
3    US    TX     44    D    dfw
4    US    TX     55    W    dfw
5   MEX    WR     54    D    dfy
6   MEX    WR      3    Y    dfy
7   MEX    WR      4    Y    dfy
8   MEX    WF    135    D    dfd
9   MEX    WF    144    D    dfd

Then you can create the three desired DataFrames with:

dfy = df[df['dfname'] == 'dfy'].reset_index(drop=True)
dfw = df[df['dfname'] == 'dfw'].reset_index(drop=True)
dfd = df[df['dfname'] == 'dfd'].reset_index(drop=True)
  •  Tags:  
  • Related