Home > database >  Compare rows and remove previous row in pandas
Compare rows and remove previous row in pandas

Time:02-02

I currently have a df that looks like

username amount verified
amy123 25 no
bob1 25 yes
amy123 26 yes
bob1 40 yes
bob1 41 yes

wondering how I can compare the amount and if the amount difference is <=1 and only if the previous row is the same user, i will keep the bigger number the final df will look like:

username amount verified
amy123 25 no
bob1 25 yes
amy123 26 yes
bob1 41 yes

CodePudding user response:

You can groupby and mask the dataframe on two conditions using (1) .shift() to compare with previous 'username' and (2) .diff() to handle the difference in 'amount'

#import packages
import pandas as pd
import numpy as np

#create the df
d = {'username': ['amy123', 'bob1', 'amy123', 'bob1', 'bob1'],
     'amount': [25,25,26,40,41],
     'verified': ['no','yes','yes','yes','yes']}
df = pd.DataFrame.from_dict(d)

#mask the df on two conditions 
df[((df['username'].shift() == df['username']) &  #keep if above user is the same
     (df.groupby('username')['amount'].diff() <= 1))] #keep if difference is less than or equal to 1

CodePudding user response:

You can compute masks for the difference in amount and the change in user and use it to groupby and get the max:

# is the difference > 1?
m1 = df['amount'].diff().gt(1)
# has the user changed?
m2 = df['username'].ne(df['username'].shift())

# keep rows matching either condition with the max amount
(df.groupby((m1|m2).cumsum(), group_keys=False)
   .apply(lambda g: g.sort_values('amount').tail(1))
 )

output:

  username  amount verified
0  amy123       25       no
1    bob1       25      yes
2  amy123       26      yes
4    bob1       41      yes

CodePudding user response:

You can compare the next entry using shift(-1):

import pandas as pd
import numpy as np
    
df = pd.DataFrame({
    'username': ['amy123', 'bob1', 'amy123', 'bob1', 'bob1'],
    'amount': [25,25,26,40,41],
    'verified': ['no','yes','yes','yes','yes']}
)

df = df[~((df["username"] == df["username"].shift(-1))
 & (df["amount"].shift(-1) - df["amount"] <= 1))]

~ is "not", shift(-1) the next entry

  •  Tags:  
  • Related