I have a dataframe like this.
ID,group,event
A,0,0
A,1,0
B,0,1
B,1,1
C,0,1
C,1,0
D,0,0
D,1,1
E,0,0
F,0,1
I want to drop some of the duplicates rows based on 'ID' and a condition: if group=0 and event=1, then delete the duplicate row which is in group=1 else do not drop duplicates
so the desired dataframe is like this
ID,group,event
A,0,0
A,1,0
B,0,1
C,0,1
D,0,0
D,1,1
E,0,0
F,0,1
CodePudding user response:
From expected ouput need remove group=1 if exist for ID also group=0 and event=1:
#test first condition
m0 = df['group'].eq(1)
#get groups if at least one `group=0` and `event=1`
m1 = df['ID'].isin(df.loc[df['group'].eq(0) & df['event'].eq(1), 'ID'])
#filter out this rows
df = df[~m0 | ~m1]
#alternative solution
#df = df[~(m0 & m1)]
print (df)
ID group event
0 A 0 0
1 A 1 0
2 B 0 1
4 C 0 1
6 D 0 0
7 D 1 1
8 E 0 0
9 F 0 1
