DF csv
This CSV and i am using it as Dataframe
colA,colB,colC
ABC,3,token
ABC,50,added
ABC,-50,deleted
xyz,20,token
pqr,50,added
pqr,-50,deleted
lmn,50,added
output
colA,colB,colC
ABC,3,token
xyz,20,token
lmn,50,added
CodePudding user response:
Methods based on abs would incorrectly remove two positive or two negative values.
I suggest to perform a self-merge using the opposite of colB:
# get indices that have a matching positive/negative
idx = (df.reset_index()
.merge(df, left_on=['colA', 'colB'], right_on=['colA', -df['colB']],
how='inner')['index']
)
# [1, 2, 4, 5] (as list)
# drop them
out = df.drop(idx)
output:
colA colB colC
0 ABC 3 token
3 xyz 20 token
6 lmn 50 added
CodePudding user response:
You want to identify all the 'positive and negative' by taking the absolute value. Then drop duplicates based only on colB (ignoring the text in colC). Don't keep any (keep=False) to make sure both are dropped.
df.abs().drop_duplicates(subset='colB', keep=False)
A small advice, given this is your first post: Please always try to put complete code examples, and make sure you ask a clear question. That makes it easier for other people to understand your problem and reply. Also, if this answers your question, don't forget to "accept" this answer.
CodePudding user response:
not sure exactly what you mean, this gets the desired output though:
df['abs_colB'] = np.abs(df['colB'])
df.drop_duplicates('abs_colB', keep='last').iloc[:, :-1].reset_index(drop=True)
| colA | colB | colC | |
|---|---|---|---|
| 0 | ABC | 3 | token |
| 1 | xyz | 20 | token |
| 2 | lmn | 50 | added |
