Count combination of values in pandas dataframe-CodePudding

Let's say we have the following df:

The id column represents a unique client.

Columns A, B, C, and D represent a product. These columns' values are binary.

1 means the client has that product. 0 means the client doesn't have that product.

I want to create a matrix table of sorts that counts the number of combinations of products that exist for all users.

This would be the desired output, given the df provided above:

CodePudding user response：

import pandas as pd

df = pd.read_fwf('table.dat', infer_nrows=1001)
cols = ['A', 'B', 'C', 'D']
df2 = df[cols]
df2.T.dot(df2)

Result:

    A   B   C   D
A   2   1   0   0
B   1   2   1   0
C   0   1   1   0
D   0   0   0   0

CodePudding user response：

I think you want a dot product:

df2 = df.set_index('id')

out = df2.T.dot(df2)

Output:

   A  B  C  D
A  2  1  0  0
B  1  2  1  0
C  0  1  1  0
D  0  0  0  0