I have this kind of Data Frame:
id type
1 a
1 b
2 b
2 a
3 c
3 b
(Each ID has only 2 rows for sure)
I'd like to count the number of each pair, when a pair is the two types per ID. I mean, to get that result for the previous table:
pair count
(a, b) 2
(b, c) 1
Thanks!
CodePudding user response:
You can use frozenset to have hashable, unordered objects to pass to value_counts:
df.groupby('id')['type'].agg(frozenset).value_counts()
output:
(a, b) 2
(b, c) 1
Name: type, dtype: int64
Note that the objects in the index are frozenset. I recommend to keep it this way (and to learn how to use them), but if you really need tuples:
out = df.groupby('id')['type'].agg(frozenset).value_counts()
out.index = out.index.map(tuple)
CodePudding user response:
You can aggregate all of the elements of one type to a list after sorting
pair = df.sort_values('type').groupby('id').agg(tuple)
and then group by this new column:
print(pair.groupby('type').size())
Which gives
type
(a, b) 2
(b, c) 1
dtype: int64
The sort here ensures that you never have (b, a), this always becomes (a, b) so they will always be grouped together. If order matters, remove the sort.
