i have a df which i was able to filter with lambda/groupby, however, instead of filter, i'd like to add a new column indicting whether it meets the condition. however, am getting error when using apply.
kindly take note that if hf >= 5 "all" are filtered, therefore, for the new column, it should indicate 'no' if 1 or more of the values in the group are >=5
original code that was used for filtering:
df=df.groupby('id').filter(lambda x: ((x.hr >= 5)).all())
data = {
"id": [11111,11111,11111,11111,
333,333,333,333,333,333,
5678,5678,5678,5678,5678,],
"hr": [4,2,5,4,5,7,6,8,5,6,7,8,6,2,4,],
"new_col": ['no','no','no','no','yes','yes',
'yes','yes','yes','yes','no','no','no','no','no',]}
df = pd.DataFrame(data)
original table:
╔═══════╦════╗ ║ id ║ hr ║ ╠═══════╬════╣ ║ 11111 ║ 4 ║ ║ 11111 ║ 2 ║ ║ 11111 ║ 5 ║ ║ 11111 ║ 4 ║ ║ 333 ║ 5 ║ ║ 333 ║ 7 ║ ║ 333 ║ 6 ║ ║ 333 ║ 8 ║ ║ 333 ║ 5 ║ ║ 333 ║ 6 ║ ║ 5678 ║ 7 ║ ║ 5678 ║ 8 ║ ║ 5678 ║ 6 ║ ║ 5678 ║ 2 ║ ║ 5678 ║ 4 ║ ╚═══════╩════╝
result trying to get
╔═══════╦════╦═════════╗ ║ id ║ hr ║ new_col ║ ╠═══════╬════╬═════════╣ ║ 11111 ║ 4 ║ no ║ ║ 11111 ║ 2 ║ no ║ ║ 11111 ║ 5 ║ no ║ ║ 11111 ║ 4 ║ no ║ ║ 333 ║ 5 ║ yes ║ ║ 333 ║ 7 ║ yes ║ ║ 333 ║ 6 ║ yes ║ ║ 333 ║ 8 ║ yes ║ ║ 333 ║ 5 ║ yes ║ ║ 333 ║ 6 ║ yes ║ ║ 5678 ║ 7 ║ no ║ ║ 5678 ║ 8 ║ no ║ ║ 5678 ║ 6 ║ no ║ ║ 5678 ║ 2 ║ no ║ ║ 5678 ║ 4 ║ no ║ ╚═══════╩════╩═════════╝
kindly advise. thank you
CodePudding user response:
You can use groupby_transform where a lambda evaluates the condition and then use np.where to assign "yes", "no" values:
df['new_col'] = np.where(df.groupby('id')['hr'].transform(lambda x: (x>=5).all()), 'yes', 'no')
Output:
id hr new_col
0 11111 4 no
1 11111 2 no
2 11111 5 no
3 11111 4 no
4 333 5 yes
5 333 7 yes
6 333 6 yes
7 333 8 yes
8 333 5 yes
9 333 6 yes
10 5678 7 no
11 5678 8 no
12 5678 6 no
13 5678 2 no
14 5678 4 no
