Home > Software engineering >  Pandas get equivalent positive number
Pandas get equivalent positive number

Time:01-06

I have data similar to this

data = {'A': [10,20,30,10,-10], 'B': [100,200,300,100,-100], 'C':[1000,2000,3000,1000, -1000]}
df = pd.DataFrame(data)

df
Index A B C
0 10 100 1000
1 20 200 2000
2 30 300 3000
3 10 100 1000
4 -10 -100 -1000

Here index value 0,3 and 4 are exactly equal but one is negative, so for such scenarios I want a fourth column D to be populated with a value 'Exact opposite' for index value 3 and 4.(Any one value)

Similar to

Index A B C D
0 10 100 1000
1 20 200 2000
2 30 300 3000
3 10 100 1000 Exact opposite
4 -10 -100 -1000 Exact opposite

One approach I can think of is by adding a column which adds values of all the columns

column_names=['A','B','C']
df['Sum Val'] = df[column_names].sum(axis=1)
Index A B C Sum val
0 10 100 1000 1110
1 20 200 2000 2200
2 30 300 3000 3300
3 10 100 1000 1110
4 -10 -100 -1000 -1110

and then check if there are any negative values and try to find out the corresponding equal positive value but could not proceed from there

Any mistakes please pardon.

CodePudding user response:

Like this maybe:

In [69]: import numpy as np

# Create column 'D' with exact duplicate rows using 'abs'
In [68]: df['D'] = np.where(df.abs().duplicated(keep=False), 'Duplicate', '') 

# If the sum of duplicated rows = 0, this means they are 'exact opposite'
In [78]: if df[df.D.eq('Duplicate')].sum(1).sum() == 0:
    ...:     df.loc[ix, 'D'] = 'Exact Opposite'
    ...: 

In [79]: df
Out[79]: 
    A    B     C               D
0  10  100  1000  Exact Opposite
1  20  200  2000                
2  30  300  3000                
3 -10 -100 -1000  Exact Opposite

CodePudding user response:

To follow your logic let us just adding abs with groupby , so the output will return the pair index as list

df.reset_index().groupby(df['Sum Val'].abs())['index'].agg(list)
Out[367]: 
Sum Val
1110    [0, 3]
2220       [1]
3330       [2]
Name: index, dtype: object

CodePudding user response:

import pandas as pd
data = {'A': [10, 20, 30, -10], 'B': [100, 200,300, -100], 'C': [1000, 2000, 3000,-1000]}
df = pd.DataFrame(data)
print(df)

df['total'] = df.sum(axis=1)
df['total'] = df['total'].apply(lambda x: "Exact opposite" if sum(df['total'] == -1*x) else "")
print(df)
  •  Tags:  
  • Related