groupby with multi level column index python-CodePudding

I have a dataframe with a multi level column index. I want to get the cross tabulations for the g1, g2 columns (per level 1 index (1,2)), grouped by the group column (a, b). I thought I could get away with just calling the top level columns but I am a bit stuck. the dataframe that I would expect as output in the end is in d2 below. All comments are welcome, Thanks alot

# the dataframe that I have 
d1 = pd.DataFrame((['i1', 'a', 'dog', 'mouse','cat','mouse'],['i2','a','cat','mouse','dog','dog'],['i3', 'a', 'dog', 'dog','cat','dog'],['i4','b','cat','dog','dog','cat']), columns = pd.MultiIndex.from_tuples(list(zip(*[['id','group','g1','g1','g2','g2'], ['-','-','1','2','1','2']]))))

# what I thought would work...
d1 = d1.set_index('id')
d1.groupby(['group'])['g1'].value_counts()


# the dataframe that I would like to have 
d2 = pd.DataFrame((['a', 'dog', 2,1,1,2],['a','mouse',0,2,0,1],['a','cat',1,0,2,0],['b','cat',1,0,1,1],['b','dog',0,1,1,1]), columns = pd.MultiIndex.from_tuples(list(zip(*[['group','category','g1','g1','g2','g2'], ['-','-','1','2','1','2']]))))

CodePudding user response：

I would recommend restructuring d1 a bit first...

d1 = d1.set_index([('id','-'),('group','-')]).stack([0,1]).reset_index()
d1.columns = ['id','group','level_1','level_2','category']

    id group level_1 level_2 category
0   i1     a      g1       1      dog
1   i1     a      g1       2    mouse
2   i1     a      g2       1      cat
3   i1     a      g2       2    mouse
4   i2     a      g1       1      cat
5   i2     a      g1       2    mouse
6   i2     a      g2       1      dog
7   i2     a      g2       2      dog
8   i3     a      g1       1      dog
9   i3     a      g1       2      dog
10  i3     a      g2       1      cat
11  i3     a      g2       2      dog
12  i4     b      g1       1      cat
13  i4     b      g1       2      dog
14  i4     b      g2       1      dog
15  i4     b      g2       2      cat

...and then using either pivot_table or groupby (result is the same)...

# pivot_table
d2 = pd.pivot_table(d1, index=['group', 'category'], columns=['level_1','level_2'], aggfunc='count', fill_value=0).droplevel(0, axis=1).rename_axis([None,None], axis=1)

# groupby
d2 = d1.groupby(['group','category','level_1','level_2'])['id'].count().unstack(['level_1','level_2'], fill_value=0).rename_axis([None,None], axis=1).sort_index(axis=1)

               g1    g2   
                1  2  1  2
group category            
a     cat       1  0  2  0
      dog       2  1  1  2
      mouse     0  2  0  1
b     cat       1  0  0  1
      dog       0  1  1  0