Home > database >  Sum selected columns for each level in Pandas Multiindex dataframe
Sum selected columns for each level in Pandas Multiindex dataframe

Time:02-02

I have a pandas dataframe like this:

import numpy as np
import pandas as pd

data = np.random.randint(0, 10, (4, 8))

multiindex = pd.MultiIndex.from_product([["a", "b"], [1, 2, 3, 4]])

df = pd.DataFrame(data, columns=multiindex)
df

lvl1    A               B
lvl2    a   b   c   d   a   b   c   d
0       3   5   3   6   3   6   4   9
1       9   6   1   9   7   1   5   1
2       6   5   5   6   0   9   7   7
3       3   0   1   4   7   1   4   8

For each unique entry in "lvl1" I want to build sums like x = a b and y = c d. The result should look like this:

    A       B
    x   y   x   y
0   7   16  9   1
1   8   6   1   8
2   8   12  10  9
3   7   7   11  7

I can reach this by iterating and concat like below, but there is probably a more elegant way with something like groupby, apply or agg.

list_df = []

sum_x = ["a", "b"]
sum_y = ["c", "d"]

for level in df.columns.get_level_values(level=0).unique():
    for sum_xy, title in zip([sum_x, sum_y], ["x", "y"]):
        df_sum = df.loc[:, (level, sum_xy)].sum(axis=1).rename((level, title))
        list_df.append(df_sum)

pd.concat(list_df, axis=1)

When I do a groupby with df.groupby("lvl1", axis=1).apply... I have the problem that the "lvl1" header remains and I don't know how to continue. Also I tried something with df.stack(level=0) to get rid of the multiindex without succes. I tried the approaches in these questions but could't find a working solution. Does anyone have an idea? Do I need a groupby at all or not? Thank you!

How to sum specific hierarchical columns in pandas?

Aggregating lambda functions in pandas and numpy

CodePudding user response:

One way is to build a dictionary, use it to map the second level, then you can groupby:

lookup = {c:'x' for c in sum_x}
lookup.update({c:'y' for c in sum_y})

df.groupby([df.columns.get_level_values(0),
            df.columns.get_level_values(1).map(lookup)],
           axis=1).sum()

Option: also

df.rename(columns=lookup, level=1).groupby(level=(0,1), axis=1).sum()

You would get something like this:

    A      B    
    x  y   x   y
0   8  5  14  14
1  11  7   9   1
2  16  3   7   5
3  10  2   1  12

CodePudding user response:

Lest try:

Preparation:

import numpy as np
import pandas as pd

data = np.array([[9, 7, 9, 6, 1, 3, 9, 6],
                 [3, 7, 6, 7, 0, 9, 1, 2],
                 [3, 3, 0, 8, 6, 3, 1, 5],
                 [2, 4, 0, 6, 2, 7, 6, 9]])

multiindex = pd.MultiIndex.from_product([["A", "B"], ['a', 'b', 'c', 'd']])
df = pd.DataFrame(data, columns=multiindex)
sum_x = ["a", "b"]
sum_y = ["c", "d"]

Code:

  1. insert 'by hand' a level in the columns multiindex
df.columns = pd.MultiIndex.from_tuples([(tup[0], 'x' if tup[1] in sum_x else 'y' ,tup[1]) 
                                        for tup in df.columns])

Now df:

   A           B         
   x     y     x     y   
   a  b  c  d  a  b  c  d
0  9  7  9  6  1  3  9  6
1  3  7  6  7  0  9  1  2
2  3  3  0  8  6  3  1  5
3  2  4  0  6  2  7  6  9
  1. standard sum on columns multindex last level
df.stack().groupby(level=0).sum()

Result:

      A          B      
      x     y    x     y
0  16.0  15.0  4.0  15.0
1  10.0  13.0  9.0   3.0
2   6.0   8.0  9.0   6.0
3   6.0   6.0  9.0  15.0
  •  Tags:  
  • Related