SUMIF Function on a Pandas DataFrame-CodePudding

Background -
I have a pandas DataFrame, containing data that I would like to do some calculation on. Specifically, I need to iterate through every row of the DataFrame and compute a calculation, which if in Excel, would be =SUMIF.

Calculation (=SUMIFS([% Ownership],[Entity ID],[@[Entity ID]])) -
In this calculation, the integer value from the Entity ID column is taken, and for every instance of that Entity ID located in the Entity ID column throughout the DataFrame, it's corresponding % Ownership value is summed and printed in the Entity ID % Column.

>>> df
    Entity ID    % Ownership    Entity ID %
0   12345        100.00         1.000000
1   45643        49.56          0.495551
2   00000        100.00         2.000000
3   00000        100.00         2.000000

In the above example:

12345 only appears once in the DataFrame, therefore its Entity ID value is 1.000000.
45643 only appears once in the DataFrame (with a 49.56 value), therefore its Entity ID value is 0.495551.
00000 appears twice in the DataFrame (both instance with 1.0 values) therefore its Entity ID value is 2.000000.

Useful info:

You will note that the data formatting in the % Ownership and Entity ID % columns is different (e.g., 100.00 vs 1.00000), this is how the cells are formatted in my Excel workbook, and I would ideally like to use this formatting and decimal rounding if possible.
I would like to understand how to iterate through every row in my DataFrame (it's 4000 rows) and perform such a calculation using a for loop, so to effectively perform this calculation for every row.

CodePudding user response：

We can use a groupby to get the sum and assign the result directly into the DataFrame like so :

>>> df.assign(Entity_ID_perc = df.groupby('Entity ID')['% Ownership'].sum() / 100)
    Entity ID   % Ownership Entity_ID_perc
0   12345       100.00      1.0000
1   45643       49.56       0.4956
2   00000       100.00      2.0000
3   00000       100.00      2.0000