I have data similar to this
data = {'A': [10,20,30,10,-10], 'B': [100,200,300,100,-100], 'C':[1000,2000,3000,1000, -1000]}
df = pd.DataFrame(data)
df
| Index | A | B | C |
|---|---|---|---|
| 0 | 10 | 100 | 1000 |
| 1 | 20 | 200 | 2000 |
| 2 | 30 | 300 | 3000 |
| 3 | 10 | 100 | 1000 |
| 4 | -10 | -100 | -1000 |
Here index value 0,3 and 4 are exactly equal but one is negative, so for such scenarios I want a fourth column D to be populated with a value 'Exact opposite' for index value 3 and 4.(Any one value)
Similar to
| Index | A | B | C | D |
|---|---|---|---|---|
| 0 | 10 | 100 | 1000 | |
| 1 | 20 | 200 | 2000 | |
| 2 | 30 | 300 | 3000 | |
| 3 | 10 | 100 | 1000 | Exact opposite |
| 4 | -10 | -100 | -1000 | Exact opposite |
One approach I can think of is by adding a column which adds values of all the columns
column_names=['A','B','C']
df['Sum Val'] = df[column_names].sum(axis=1)
| Index | A | B | C | Sum val |
|---|---|---|---|---|
| 0 | 10 | 100 | 1000 | 1110 |
| 1 | 20 | 200 | 2000 | 2200 |
| 2 | 30 | 300 | 3000 | 3300 |
| 3 | 10 | 100 | 1000 | 1110 |
| 4 | -10 | -100 | -1000 | -1110 |
and then check if there are any negative values and try to find out the corresponding equal positive value but could not proceed from there
Any mistakes please pardon.
CodePudding user response:
Like this maybe:
In [69]: import numpy as np
# Create column 'D' with exact duplicate rows using 'abs'
In [68]: df['D'] = np.where(df.abs().duplicated(keep=False), 'Duplicate', '')
# If the sum of duplicated rows = 0, this means they are 'exact opposite'
In [78]: if df[df.D.eq('Duplicate')].sum(1).sum() == 0:
...: df.loc[ix, 'D'] = 'Exact Opposite'
...:
In [79]: df
Out[79]:
A B C D
0 10 100 1000 Exact Opposite
1 20 200 2000
2 30 300 3000
3 -10 -100 -1000 Exact Opposite
CodePudding user response:
To follow your logic let us just adding abs with groupby , so the output will return the pair index as list
df.reset_index().groupby(df['Sum Val'].abs())['index'].agg(list)
Out[367]:
Sum Val
1110 [0, 3]
2220 [1]
3330 [2]
Name: index, dtype: object
CodePudding user response:
import pandas as pd
data = {'A': [10, 20, 30, -10], 'B': [100, 200,300, -100], 'C': [1000, 2000, 3000,-1000]}
df = pd.DataFrame(data)
print(df)
df['total'] = df.sum(axis=1)
df['total'] = df['total'].apply(lambda x: "Exact opposite" if sum(df['total'] == -1*x) else "")
print(df)
