Counting pairs of rows in pandas-CodePudding

I have this kind of Data Frame:

id  type
1   a
1   b
2   b
2   a
3   c
3   b

(Each ID has only 2 rows for sure)

I'd like to count the number of each pair, when a pair is the two types per ID. I mean, to get that result for the previous table:

pair  count
(a, b)   2
(b, c)   1

Thanks!

CodePudding user response：

You can use frozenset to have hashable, unordered objects to pass to value_counts:

df.groupby('id')['type'].agg(frozenset).value_counts()

output:

(a, b)    2
(b, c)    1
Name: type, dtype: int64

Note that the objects in the index are frozenset. I recommend to keep it this way (and to learn how to use them), but if you really need tuples:

out = df.groupby('id')['type'].agg(frozenset).value_counts()
out.index = out.index.map(tuple)

CodePudding user response：

You can aggregate all of the elements of one type to a list after sorting

pair = df.sort_values('type').groupby('id').agg(tuple)

and then group by this new column:

print(pair.groupby('type').size())

Which gives

type
(a, b)    2
(b, c)    1
dtype: int64

The sort here ensures that you never have (b, a), this always becomes (a, b) so they will always be grouped together. If order matters, remove the sort.