Home > Software engineering >  Pandas keep rows after a condition is true for each group
Pandas keep rows after a condition is true for each group

Time:02-03

I have currently the following issue where I'm looking to drop rows in a pandas dataframe before a condition is met. I would like to drop every row prior to the condition of 10 or greater is met in the column "Number", depending on the Name column. Suppose dataframe (df) has two columns called 'Name' and 'Number'. I desire to drop all rows of each unique name before the condition is met, and keep all rows after.

Name  Number
Matt    4
Matt    5
Matt    13
Matt    4
Sophie  5 
Sophie  14
Sophie  18
Steve   5
Steve   4
Steve   21 

Desired output: 
Name   Number
Matt    13
Matt    4
Sophie  14
Sophie  18
Steve   21

The following code below drops all the items after the condition is met, but this is applied for the entire series / column of values. I would like to do this separately for each unique name occurring. Is there any way to do this, perhaps with a groupby?

df[(df['Number'] > 10).argmax():]

Thank you very much in advance for any help!

CodePudding user response:

You can use groupby cummax. cummax converts every False after True into True, so if you groupby "Name", it creates a boolean mask where every value after the value greater than 10 is selected:

msk = df['Number'].gt(10).groupby(df['Name']).cummax()
out = df[msk]

Output:

     Name  Number
2    Matt      13
3    Matt       4
5  Sophie      14
6  Sophie      18
9   Steve      21

CodePudding user response:

Use GroupBy.cummax with condition for get greater values like 10 in Series.gt and filter by boolean indexing:

df = df[df['Number'].gt(10).groupby(df['Name']).cummax()]
print (df)
     Name  Number
2    Matt      13
3    Matt       4
5  Sophie      14
6  Sophie      18
9   Steve      21

Detail: Trues are repeated to end of group after first match:

print (df['Number'].gt(10).groupby(df['Name']).cummax())
1    False
2     True
3     True
4    False
5     True
6     True
7    False
8    False
9     True
Name: Number, dtype: bool
  •  Tags:  
  • Related