Home > database >  Converting groupby pandas df of absolute numbers to percentage of row totals
Converting groupby pandas df of absolute numbers to percentage of row totals

Time:01-22

I have some data in my df df that shows the 2 categories a user belongs to. For which I want to see the number of users for each category pair expressed as a %total of the row.

Original dataframe df:


 ------ ------ -------- 
| cat1 | cat2 |  user  |
 ------ ------ -------- 
| A    | X    | 687568 |
| A    | Y    |  68575 |
| B    | Y    |  56478 |
| A    | X    |   6587 |
| A    | Y    |  45678 |
| B    | X    |   5678 |
| B    | X    |    967 |
| A    | X    |    345 |
 ------ ------ -------- 

I convert this to a groupby df using: df2 = df.groupby(['cat1', 'cat2']).agg({'user': 'nunique'}).reset_index().pivot(index='cat1', columns='cat2',values='user')to get the pairwise calculation of number of users per combination of categories (numbers here are made up):

 ------ ---- ---- 
| cat2 | X  | Y  |
 ------ ---- ---- 
| cat1 |    |    |
 ------ ---- ---- 
| A    |  5 |  5 |
| B    | 10 | 40 |
 ------ ---- ---- 

And I would like to convert the numbers to percent totals of the rows (Cat1), e.g. for the first row, 5/(5 5) = 0.5 and so on to give:

 ------ ----- ----- 
| cat2 |  X  |  Y  |
 ------ ----- ----- 
| cat1 |     |     |
| A    | 0.5 | 0.5 |
| B    | 0.2 | 0.8 |
 ------ ----- ----- 

Would I have to create a new column in my grouped df that contains the row-wise sum, and then iterate through each value in a row and divide it by that total?

CodePudding user response:

You can simplify your expression:

piv = df.pivot_table('user', 'cat1', 'cat2', aggfunc='nunique')
pct = piv.div(piv.sum(axis=1), axis=0)

Output:

>>> piv
cat2  X  Y
cat1      
A     3  2
B     2  1

>>> pct
cat2         X         Y
cat1                    
A     0.600000  0.400000
B     0.666667  0.333333
  •  Tags:  
  • Related