I have a dataframe likes this:
| Alliance_name | Company_name | TOAD | MBA | Class | EVE | TBD | Sur |
|---|---|---|---|---|---|---|---|
| Shinva group | HVC corp | 8845 | 1135 | 0 | 12 | 12128 | 1 |
| Shinva group | LDN corp | 11 | 1243 | 133 | 121 | 113 | 1 |
| Telegraph group | Freename LLC | 5487 | 223 | 928 | 0 | 0 | 21 |
| Telegraph group | Grt | 0 | 7543 | 24 | 3213 | 15 | 21 |
| Zero group | PetZoo crp | 5574 | 0 | 2 | 0 | 6478 | 1 |
| Zero group | Elephant | 48324 | 0 | 32 | 118 | 4 | 1 |
I need to subtract values between cells in the column if they have the same Alliance_name. (it would be perfect not to subtract the last column Sur, but it is not the main target)
I know that for addition we can make something like this:
df = df.groupby('Alliance_name').sum()
But I don't know how to do this with subtraction.
The result should be like this (if we don't subtract the last column):
| Alliance_name | Company_name | TOAD | MBA | Class | EVE | TBD | Sur |
|---|---|---|---|---|---|---|---|
| Shinva group | HVC corp LDN corp | 8834 | -108 | -133 | -109 | 12015 | 1 |
| Telegraph group | Freename LLC Grt | 5487 | -7320 | 904 | -3212 | -15 | 21 |
| Zero group | PetZoo crp Elephant | -42750 | 0 | -30 | -118 | 6474 | 1 |
Thanks for your help!
CodePudding user response:
You could invert the values to subtract, and then sum them.
df.loc[df.Alliance_name.duplicated(keep="first"), ["TOAD", "MBA", "Class", "EVE", "TBD", "Sur"]] *= -1
df.groupby("Alliance_name").sum()
CodePudding user response:
IIUC, you just want to subtract top from the bottom right? Then you can use groupby_diff and dropna to drop NaN values (since the first rows will be NaN). Then concatenate the result with "Company_name" and "Alliance_name".
Note that since diff finds the difference between top and bottom, we need the minus sign.
out = pd.concat([df.groupby('Alliance_name')['Company_name'].first().reset_index(),
-df.drop(columns=['Company_name']).groupby('Alliance_name')
.diff()
.dropna()
.reset_index(drop=True)
], axis=1)
Output:
Alliance_name Company_name TOAD MBA Class EVE TBD Sur
0 Shinva group HVC corp 8834.0 -108.0 -133.0 -109.0 12015.0 -0.0
1 Telegraph group Freename LLC 5487.0 -7320.0 904.0 -3213.0 -15.0 -0.0
2 Zero group PetZoo crp -42750.0 -0.0 -30.0 -118.0 6474.0 -0.0
CodePudding user response:
The .first() and .last() groupby methods can be useful for such tasks.
You can organize the columns you want to skip/compute
>>> df.columns
Index(['Alliance_name', 'Company_name', 'TOAD', 'MBA', 'Class', 'EVE', 'TBD',
'Sur'],
dtype='object')
>>> alliance, company, *cols, sur = df.columns
>>> groups = df.groupby(alliance)
>>> company = groups.first()[[company]]
>>> sur = groups.first()[sur]
>>> groups = groups[cols]
And use .first() - .last() directly:
>>> groups.first() - groups.last()
TOAD MBA Class EVE TBD
Alliance_name
Shinva group 8834 -108 -133 -109 12015
Telegraph group 5487 -7320 904 -3213 -15
Zero group -42750 0 -30 -118 6474
Then .join() the other columns back in
>>> company.join(groups.first() - groups.last()).join(sur).reset_index()
Alliance_name Company_name TOAD MBA Class EVE TBD Sur
0 Shinva group HVC corp 8834 -108 -133 -109 12015 1
1 Telegraph group Freename LLC 5487 -7320 904 -3213 -15 21
2 Zero group PetZoo crp -42750 0 -30 -118 6474 1
Another approach:
>>> df - df.drop(columns=['Company_name', 'Sur']) .groupby('Alliance_name').shift(-1)
Alliance_name Class Company_name EVE MBA Sur TBD TOAD
0 NaN -133.0 NaN -109.0 -108.0 NaN 12015.0 8834.0
1 NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN 904.0 NaN -3213.0 -7320.0 NaN -15.0 5487.0
3 NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN -30.0 NaN -118.0 0.0 NaN 6474.0 -42750.0
5 NaN NaN NaN NaN NaN NaN NaN NaN
You can then drop the all nan rows and fill the remainder values from the original df.
>>> ((df - df.drop(columns=['Company_name', 'Sur'])
.groupby('Alliance_name').shift(-1)).dropna(how='all')[df.columns].fillna(df))
Alliance_name Company_name TOAD MBA Class EVE TBD Sur
0 Shinva group HVC corp 8834 -108 -133 -109 12015 1
2 Telegraph group Freename LLC 5487 -7320 904 -3213 -15 21
4 Zero group PetZoo crp -42750 0 -30 -118 6474 1
