Let's say we have the following df:
| id | A | B | C | D |
|---|---|---|---|---|
| 123 | 1 | 1 | 0 | 0 |
| 456 | 0 | 1 | 1 | 0 |
| 786 | 1 | 0 | 0 | 0 |
The id column represents a unique client.
Columns A, B, C, and D represent a product. These columns' values are binary.
1 means the client has that product.
0 means the client doesn't have that product.
I want to create a matrix table of sorts that counts the number of combinations of products that exist for all users.
This would be the desired output, given the df provided above:
| A | B | C | D | |
|---|---|---|---|---|
| A | 2 | 1 | 0 | 0 |
| B | 0 | 2 | 1 | 0 |
| C | 0 | 1 | 1 | 0 |
| D | 0 | 0 | 1 | 0 |
CodePudding user response:
import pandas as pd
df = pd.read_fwf('table.dat', infer_nrows=1001)
cols = ['A', 'B', 'C', 'D']
df2 = df[cols]
df2.T.dot(df2)
Result:
A B C D
A 2 1 0 0
B 1 2 1 0
C 0 1 1 0
D 0 0 0 0
CodePudding user response:
I think you want a dot product:
df2 = df.set_index('id')
out = df2.T.dot(df2)
Output:
A B C D
A 2 1 0 0
B 1 2 1 0
C 0 1 1 0
D 0 0 0 0
