I have the following df:
A B C D
0 foo a 1200 300
0 foo a 700 300
0 foo b 1000 300
1 bar b 270 70
1 bar a 350 70
2 abc c 270 300
2 abc a 350 300
I want to display the sum of values in column D grouped by column B, but I do not want to sum the values in column B for a single value in column A. That is, column D has only one value per value in column A.
foo will only ever have the value 300 and bar will only have the value 70 in column D. The values in this column are just repeated because I have repeated indexes.
I want to print something like (no need to show formatting, I just need to output the correct sums):
a: 300 (from foo) 300 (from foo) 70 (from bar) = 670
b: 300 (from foo) 70 (from bar) = 370
c: 300 (from abc)
That is, values in column D should not be summed together if the value in column A is the same among them.
CodePudding user response:
You could use pd.unique() after the groupby and then sum those values up.
df.groupby('B')['D'].apply(lambda x: sum(pd.unique(x)))
B
a 370
b 370
Name: D, dtype: int64
