I currently have a df that looks like
| username | amount | verified |
|---|---|---|
| amy123 | 25 | no |
| bob1 | 25 | yes |
| amy123 | 26 | yes |
| bob1 | 40 | yes |
| bob1 | 41 | yes |
wondering how I can compare the amount and if the amount difference is <=1 and only if the previous row is the same user, i will keep the bigger number the final df will look like:
| username | amount | verified |
|---|---|---|
| amy123 | 25 | no |
| bob1 | 25 | yes |
| amy123 | 26 | yes |
| bob1 | 41 | yes |
CodePudding user response:
You can groupby and mask the dataframe on two conditions using (1) .shift() to compare with previous 'username' and (2) .diff() to handle the difference in 'amount'
#import packages
import pandas as pd
import numpy as np
#create the df
d = {'username': ['amy123', 'bob1', 'amy123', 'bob1', 'bob1'],
'amount': [25,25,26,40,41],
'verified': ['no','yes','yes','yes','yes']}
df = pd.DataFrame.from_dict(d)
#mask the df on two conditions
df[((df['username'].shift() == df['username']) & #keep if above user is the same
(df.groupby('username')['amount'].diff() <= 1))] #keep if difference is less than or equal to 1
CodePudding user response:
You can compute masks for the difference in amount and the change in user and use it to groupby and get the max:
# is the difference > 1?
m1 = df['amount'].diff().gt(1)
# has the user changed?
m2 = df['username'].ne(df['username'].shift())
# keep rows matching either condition with the max amount
(df.groupby((m1|m2).cumsum(), group_keys=False)
.apply(lambda g: g.sort_values('amount').tail(1))
)
output:
username amount verified
0 amy123 25 no
1 bob1 25 yes
2 amy123 26 yes
4 bob1 41 yes
CodePudding user response:
You can compare the next entry using shift(-1):
import pandas as pd
import numpy as np
df = pd.DataFrame({
'username': ['amy123', 'bob1', 'amy123', 'bob1', 'bob1'],
'amount': [25,25,26,40,41],
'verified': ['no','yes','yes','yes','yes']}
)
df = df[~((df["username"] == df["username"].shift(-1))
& (df["amount"].shift(-1) - df["amount"] <= 1))]
~ is "not", shift(-1) the next entry
