Subtract values with groupby in Pandas dataframe Python-CodePudding

I have a dataframe likes this:

Alliance_name	Company_name	TOAD	MBA	Class	EVE	TBD	Sur
Shinva group	HVC corp	8845	1135	0	12	12128	1
Shinva group	LDN corp	11	1243	133	121	113	1
Telegraph group	Freename LLC	5487	223	928	0	0	21
Telegraph group	Grt	0	7543	24	3213	15	21
Zero group	PetZoo crp	5574	0	2	0	6478	1
Zero group	Elephant	48324	0	32	118	4	1

I need to subtract values between cells in the column if they have the same Alliance_name. (it would be perfect not to subtract the last column Sur, but it is not the main target)

I know that for addition we can make something like this:

df = df.groupby('Alliance_name').sum()

But I don't know how to do this with subtraction.

The result should be like this (if we don't subtract the last column):

Alliance_name	Company_name	TOAD	MBA	Class	EVE	TBD	Sur
Shinva group	HVC corp LDN corp	8834	-108	-133	-109	12015	1
Telegraph group	Freename LLC Grt	5487	-7320	904	-3212	-15	21
Zero group	PetZoo crp Elephant	-42750	0	-30	-118	6474	1

Thanks for your help!

CodePudding user response：

You could invert the values to subtract, and then sum them.

df.loc[df.Alliance_name.duplicated(keep="first"), ["TOAD", "MBA", "Class", "EVE", "TBD", "Sur"]] *= -1
df.groupby("Alliance_name").sum()

CodePudding user response：

IIUC, you just want to subtract top from the bottom right? Then you can use groupby_diff and dropna to drop NaN values (since the first rows will be NaN). Then concatenate the result with "Company_name" and "Alliance_name".

Note that since diff finds the difference between top and bottom, we need the minus sign.

out = pd.concat([df.groupby('Alliance_name')['Company_name'].first().reset_index(), 
                 -df.drop(columns=['Company_name']).groupby('Alliance_name')
                 .diff()
                 .dropna()
                 .reset_index(drop=True)
                ], axis=1)

Output:

     Alliance_name  Company_name     TOAD     MBA  Class     EVE      TBD  Sur
0     Shinva group      HVC corp   8834.0  -108.0 -133.0  -109.0  12015.0 -0.0
1  Telegraph group  Freename LLC   5487.0 -7320.0  904.0 -3213.0    -15.0 -0.0
2       Zero group    PetZoo crp -42750.0    -0.0  -30.0  -118.0   6474.0 -0.0

CodePudding user response：

The .first() and .last() groupby methods can be useful for such tasks.

You can organize the columns you want to skip/compute

>>> df.columns
Index(['Alliance_name', 'Company_name', 'TOAD', 'MBA', 'Class', 'EVE', 'TBD',
   'Sur'],
  dtype='object')
>>> alliance, company, *cols, sur = df.columns
>>> groups = df.groupby(alliance)
>>> company = groups.first()[[company]]
>>> sur = groups.first()[sur]
>>> groups = groups[cols]

And use .first() - .last() directly:

>>> groups.first() - groups.last()
                  TOAD   MBA  Class   EVE    TBD
Alliance_name
Shinva group      8834  -108   -133  -109  12015
Telegraph group   5487 -7320    904 -3213    -15
Zero group      -42750     0    -30  -118   6474

Then .join() the other columns back in

>>> company.join(groups.first() - groups.last()).join(sur).reset_index()
     Alliance_name  Company_name   TOAD   MBA  Class   EVE    TBD  Sur
0     Shinva group      HVC corp   8834  -108   -133  -109  12015    1
1  Telegraph group  Freename LLC   5487 -7320    904 -3213    -15   21
2       Zero group    PetZoo crp -42750     0    -30  -118   6474    1

Another approach:

>>> df - df.drop(columns=['Company_name', 'Sur']) .groupby('Alliance_name').shift(-1)
   Alliance_name  Class  Company_name     EVE     MBA  Sur      TBD     TOAD
0            NaN -133.0           NaN  -109.0  -108.0  NaN  12015.0   8834.0
1            NaN    NaN           NaN     NaN     NaN  NaN      NaN      NaN
2            NaN  904.0           NaN -3213.0 -7320.0  NaN    -15.0   5487.0
3            NaN    NaN           NaN     NaN     NaN  NaN      NaN      NaN
4            NaN  -30.0           NaN  -118.0     0.0  NaN   6474.0 -42750.0
5            NaN    NaN           NaN     NaN     NaN  NaN      NaN      NaN

You can then drop the all nan rows and fill the remainder values from the original df.

>>> ((df - df.drop(columns=['Company_name', 'Sur'])
             .groupby('Alliance_name').shift(-1)).dropna(how='all')[df.columns].fillna(df))
     Alliance_name  Company_name   TOAD   MBA  Class   EVE    TBD  Sur
0     Shinva group      HVC corp   8834  -108   -133  -109  12015    1
2  Telegraph group  Freename LLC   5487 -7320    904 -3213    -15   21
4       Zero group    PetZoo crp -42750     0    -30  -118   6474    1