I have a dataframe df :-
| ID | Date | Event |
|---|---|---|
| 3 | 04-09-2013 | Success |
| 3 | 27-09-2013 | Success |
| 3 | 14-10-2013 | Success |
| 3 | 15-10-2013 | |
| 3 | 18-10-2013 | Form |
| 3 | 19-10-2013 | Click |
If the first event for every group of ID is Success or followed by Consecutive Success(note
strictly start with and /or followed by consecutive Success )then I want to drop the ID from the dataframe(Drop all the rows for that ID from the df)
Also if the case where the group is this way :-
| ID | Date | Event |
|---|---|---|
| 4 | 04-09-2013 | Success |
| 4 | 27-09-2013 | Success |
| 4 | 14-10-2013 | |
| 4 | 15-10-2013 | |
| 4 | 18-10-2013 | Success |
| 4 | 18-10-2013 | Success |
| 4 | 18-10-2013 | Click |
Expected output to be (i.e a.)remove rows if the first event/consecutive events are Success and also b.)drop the events(rows) after the last/latest success within each group(taken one example of Event click but if there are more events those are to be dropped as well):
| ID | Date | Event |
|---|---|---|
| 4 | 14-10-2013 | |
| 4 | 15-10-2013 | |
| 4 | 18-10-2013 | Success |
| 4 | 18-10-2013 | Success |
b.) is valid for every group irrespective of how the Event starts( for example ) :-
| ID | Date | Event |
|---|---|---|
| 5 | 14-10-2013 | |
| 5 | 15-10-2013 | Click |
| 5 | 18-10-2013 | Success |
| 5 | 18-10-2013 |
Expected :-
| ID | Date | Event |
|---|---|---|
| 5 | 14-10-2013 | |
| 5 | 15-10-2013 | Click |
| 5 | 18-10-2013 | Success |
How can this be achieved?
CodePudding user response:
Use:
#remove values after last Success per groups not remove only non Success groups
df = df.iloc[::-1]
m = df['Event'].eq('Success')
df = df[~m.groupby(df['ID']).transform('all') | m.groupby(df['ID']).cummax()].iloc[::-1]
#remove first conscutive Success per groups
m1 = df['Event'].ne(df.groupby(df['ID'])['Event'].shift()).groupby(df['ID']).cumsum().ne(1)
#alternative solution
#m1 = df['Event'].mask(df['Event'].eq('Success')).groupby(df['ID']).ffill().notna()
m2 = df['Event'].ne('Success')
df = df[m1 | m2]
print (df)
ID Date Event
8 4 14-10-2013 Email
9 4 15-10-2013 Email
10 4 18-10-2013 Success
11 4 18-10-2013 Success
13 5 14-10-2013 Email
14 5 15-10-2013 Click
15 5 18-10-2013 Success
