I have currently the following issue where I'm looking to drop rows in a pandas dataframe before a condition is met. I would like to drop every row prior to the condition of 10 or greater is met in the column "Number", depending on the Name column. Suppose dataframe (df) has two columns called 'Name' and 'Number'. I desire to drop all rows of each unique name before the condition is met, and keep all rows after.
Name Number
Matt 4
Matt 5
Matt 13
Matt 4
Sophie 5
Sophie 14
Sophie 18
Steve 5
Steve 4
Steve 21
Desired output:
Name Number
Matt 13
Matt 4
Sophie 14
Sophie 18
Steve 21
The following code below drops all the items after the condition is met, but this is applied for the entire series / column of values. I would like to do this separately for each unique name occurring. Is there any way to do this, perhaps with a groupby?
df[(df['Number'] > 10).argmax():]
Thank you very much in advance for any help!
CodePudding user response:
You can use groupby cummax. cummax converts every False after True into True, so if you groupby "Name", it creates a boolean mask where every value after the value greater than 10 is selected:
msk = df['Number'].gt(10).groupby(df['Name']).cummax()
out = df[msk]
Output:
Name Number
2 Matt 13
3 Matt 4
5 Sophie 14
6 Sophie 18
9 Steve 21
CodePudding user response:
Use GroupBy.cummax with condition for get greater values like 10 in Series.gt and filter by boolean indexing:
df = df[df['Number'].gt(10).groupby(df['Name']).cummax()]
print (df)
Name Number
2 Matt 13
3 Matt 4
5 Sophie 14
6 Sophie 18
9 Steve 21
Detail: Trues are repeated to end of group after first match:
print (df['Number'].gt(10).groupby(df['Name']).cummax())
1 False
2 True
3 True
4 False
5 True
6 True
7 False
8 False
9 True
Name: Number, dtype: bool
