Home > Back-end >  Subtract values with groupby in Pandas dataframe Python
Subtract values with groupby in Pandas dataframe Python

Time:01-27

I have a dataframe likes this:

Alliance_name Company_name TOAD MBA Class EVE TBD Sur
Shinva group HVC corp 8845 1135 0 12 12128 1
Shinva group LDN corp 11 1243 133 121 113 1
Telegraph group Freename LLC 5487 223 928 0 0 21
Telegraph group Grt 0 7543 24 3213 15 21
Zero group PetZoo crp 5574 0 2 0 6478 1
Zero group Elephant 48324 0 32 118 4 1

I need to subtract values between cells in the column if they have the same Alliance_name. (it would be perfect not to subtract the last column Sur, but it is not the main target)

I know that for addition we can make something like this:

df = df.groupby('Alliance_name').sum()

But I don't know how to do this with subtraction.

The result should be like this (if we don't subtract the last column):

Alliance_name Company_name TOAD MBA Class EVE TBD Sur
Shinva group HVC corp LDN corp 8834 -108 -133 -109 12015 1
Telegraph group Freename LLC Grt 5487 -7320 904 -3212 -15 21
Zero group PetZoo crp Elephant -42750 0 -30 -118 6474 1

Thanks for your help!

CodePudding user response:

You could invert the values to subtract, and then sum them.

df.loc[df.Alliance_name.duplicated(keep="first"), ["TOAD", "MBA", "Class", "EVE", "TBD", "Sur"]] *= -1
df.groupby("Alliance_name").sum()

CodePudding user response:

IIUC, you just want to subtract top from the bottom right? Then you can use groupby_diff and dropna to drop NaN values (since the first rows will be NaN). Then concatenate the result with "Company_name" and "Alliance_name".

Note that since diff finds the difference between top and bottom, we need the minus sign.

out = pd.concat([df.groupby('Alliance_name')['Company_name'].first().reset_index(), 
                 -df.drop(columns=['Company_name']).groupby('Alliance_name')
                 .diff()
                 .dropna()
                 .reset_index(drop=True)
                ], axis=1)

Output:

     Alliance_name  Company_name     TOAD     MBA  Class     EVE      TBD  Sur
0     Shinva group      HVC corp   8834.0  -108.0 -133.0  -109.0  12015.0 -0.0
1  Telegraph group  Freename LLC   5487.0 -7320.0  904.0 -3213.0    -15.0 -0.0
2       Zero group    PetZoo crp -42750.0    -0.0  -30.0  -118.0   6474.0 -0.0

CodePudding user response:

The .first() and .last() groupby methods can be useful for such tasks.

You can organize the columns you want to skip/compute

>>> df.columns
Index(['Alliance_name', 'Company_name', 'TOAD', 'MBA', 'Class', 'EVE', 'TBD',
   'Sur'],
  dtype='object')
>>> alliance, company, *cols, sur = df.columns
>>> groups = df.groupby(alliance)
>>> company = groups.first()[[company]]
>>> sur = groups.first()[sur]
>>> groups = groups[cols]

And use .first() - .last() directly:

>>> groups.first() - groups.last()
                  TOAD   MBA  Class   EVE    TBD
Alliance_name
Shinva group      8834  -108   -133  -109  12015
Telegraph group   5487 -7320    904 -3213    -15
Zero group      -42750     0    -30  -118   6474

Then .join() the other columns back in

>>> company.join(groups.first() - groups.last()).join(sur).reset_index()
     Alliance_name  Company_name   TOAD   MBA  Class   EVE    TBD  Sur
0     Shinva group      HVC corp   8834  -108   -133  -109  12015    1
1  Telegraph group  Freename LLC   5487 -7320    904 -3213    -15   21
2       Zero group    PetZoo crp -42750     0    -30  -118   6474    1

Another approach:

>>> df - df.drop(columns=['Company_name', 'Sur']) .groupby('Alliance_name').shift(-1)
   Alliance_name  Class  Company_name     EVE     MBA  Sur      TBD     TOAD
0            NaN -133.0           NaN  -109.0  -108.0  NaN  12015.0   8834.0
1            NaN    NaN           NaN     NaN     NaN  NaN      NaN      NaN
2            NaN  904.0           NaN -3213.0 -7320.0  NaN    -15.0   5487.0
3            NaN    NaN           NaN     NaN     NaN  NaN      NaN      NaN
4            NaN  -30.0           NaN  -118.0     0.0  NaN   6474.0 -42750.0
5            NaN    NaN           NaN     NaN     NaN  NaN      NaN      NaN

You can then drop the all nan rows and fill the remainder values from the original df.

>>> ((df - df.drop(columns=['Company_name', 'Sur'])
             .groupby('Alliance_name').shift(-1)).dropna(how='all')[df.columns].fillna(df))
     Alliance_name  Company_name   TOAD   MBA  Class   EVE    TBD  Sur
0     Shinva group      HVC corp   8834  -108   -133  -109  12015    1
2  Telegraph group  Freename LLC   5487 -7320    904 -3213    -15   21
4       Zero group    PetZoo crp -42750     0    -30  -118   6474    1
  •  Tags:  
  • Related