How can I groupby a DataFrame at the same time I count the values and put in different columns?-CodePudding

I have a DataFrame that looks like the one below

Index  Category  Class
 0        1        A
 1        1        A
 2        1        B
 3        2        A
 4        3        B
 5        3        B

And I would like to get an output data frame that groups by category and have one column for each of the classes with the counting of the occurrences of that class in each category, such as the one below

Index Category   A   B
 0      1        2   1
 1      2        1   0
 2      3        0   2

So far I've tried various combinations of the groupby and agg methods, but I still can't get what I want. I've also tried df.pivot_table(index='Category', columns='Class', aggfunc='count'), but that return a DataFrame without columns. Any ideas of what could work in this case?

CodePudding user response：

You can use aggfunc="size" to achieve your desired result:

>>> df.pivot_table(index='Category', columns='Class', aggfunc='size', fill_value=0)

Class     A  B
Category
1         2  1
2         1  0
3         0  2

Alternatively, you can use .groupby(...).size() to get the counts, and then unstack to reshape your data as well:

>>> df.groupby(["Category", "Class"]).size().unstack(fill_value=0)

Class     A  B
Category
1         2  1
2         1  0
3         0  2

CodePudding user response：

Assign a dummy value to count:

out = df.assign(val=1).pivot_table('val', 'Category', 'Class',
                                   aggfunc='count', fill_value=0).reset_index()
print(out)

# Output
Class  Category  A  B
0             1  2  1
1             2  1  0
2             3  0  2

CodePudding user response：

    import pandas as pd
df = pd.DataFrame({'Index':[0,1,2,3,4,5],
                   'Category': [1,1,1,2,3,3],
                   'Class':['A','A','B','A','B','B'],
              })
df = df.groupby(['Category', 'Class']).count()
df = df.pivot_table(index='Category', columns='Class')
print(df)

output:

             Index     
Class        A    B
Category           
1          2.0  1.0
2          1.0  NaN
3          NaN  2.0

CodePudding user response：

Use crosstab:

pd.crosstab(df['Category'], df['Class']).reset_index()

output:

Class  Category  A  B
0             1  2  1
1             2  1  0
2             3  0  2