I have a df like this:
| Foo | Bar |
|---|---|
| A | 54 |
| A | 95 |
| B | 49 |
| A | 25 |
| B | 25 |
| B | 32 |
| B | 55 |
| A | 98 |
| A | 83 |
| B | 90 |
I am computing some statistics using [aggfunc][1]:
temp = pd.DataFrame.from_dict({'Foo': {0: 'A',
1: 'A',
2: 'B',
3: 'A',
4: 'B',
5: 'B',
6: 'B',
7: 'A',
8: 'A',
9: 'B'},
'Bar': {0: 54, 1: 95, 2: 49, 3: 25, 4: 25, 5: 32, 6: 55, 7: 98, 8: 83, 9: 90}})
res_df = temp.groupby('Foo').agg(
max_val = pd.NamedAgg(column='Bar', aggfunc='max'),
min_val = pd.NamedAgg(column='Bar', aggfunc='min'),
std_val = pd.NamedAgg(column='Bar', aggfunc='std')
)
Which results (correctly) in:
| Foo | max_val | min_val | std_val |
|---|---|---|---|
| A | 98 | 25 | 31.0403 |
| B | 90 | 25 | 25.37124 |
However, I want to add another column which is proportion of total, so for example:
| Foo | max_val | min_val | std_val | pct_total |
|---|---|---|---|---|
| A | 98 | 25 | 31.0403 | 0.59 |
| B | 90 | 25 | 25.37124 | 0.41 |
How can I accomplish this using aggfunc?
[1]: https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html
CodePudding user response:
You could try as follows:
res_df = temp.groupby('Foo').agg(
max_val = pd.NamedAgg(column='Bar', aggfunc='max'),
min_val = pd.NamedAgg(column='Bar', aggfunc='min'),
std_val = pd.NamedAgg(column='Bar', aggfunc='std'),
pct_total = pd.NamedAgg(column='Bar', aggfunc=lambda x: x.sum()/temp.Bar.sum())
)
print(res_df)
max_val min_val std_val pct_total
Foo
A 98 25 31.040296 0.585809
B 90 25 25.371244 0.414191
Or use round(x.sum()/temp.Bar.sum(),2) if you want them rounded: 0.59, 0.41.
