Home > Net >  Drop some of duplicate rows based on condition
Drop some of duplicate rows based on condition

Time:01-12

I have a dataframe like this.

ID,group,event
A,0,0
A,1,0
B,0,1
B,1,1
C,0,1
C,1,0
D,0,0
D,1,1
E,0,0
F,0,1

I want to drop some of the duplicates rows based on 'ID' and a condition: if group=0 and event=1, then delete the duplicate row which is in group=1 else do not drop duplicates

so the desired dataframe is like this

ID,group,event
A,0,0
A,1,0
B,0,1
C,0,1
D,0,0
D,1,1
E,0,0
F,0,1

CodePudding user response:

From expected ouput need remove group=1 if exist for ID also group=0 and event=1:

#test first condition
m0 = df['group'].eq(1)
#get groups if at least one `group=0` and `event=1`
m1 = df['ID'].isin(df.loc[df['group'].eq(0) & df['event'].eq(1), 'ID'])

#filter out this rows
df = df[~m0 | ~m1]
#alternative solution
#df = df[~(m0 & m1)]
print (df)
  ID  group  event
0  A      0      0
1  A      1      0
2  B      0      1
4  C      0      1
6  D      0      0
7  D      1      1
8  E      0      0
9  F      0      1
  •  Tags:  
  • Related