Have this table where an email address (col 1) can be associated with multiple instances (col 2). For instance, [email protected] is linked to instance id 158 and 274.
In pandas, how do I look up how many instances / which instances are associated with a particular email address.
CodePudding user response:
Try this
import pandas as pd
import numpy as np
df = pd.read_csv('./data/mails.csv')
df
email address instance ID
0 [email protected] 158
1 [email protected] 189
2 [email protected] 274
3 [email protected] 274
4 [email protected] 274
5 [email protected] 274
6 [email protected] 274
7 [email protected] 200
# count instances by mail
print(df.groupby(by='email address').count())
instance ID
email address
[email protected] 1
[email protected] 2
[email protected] 1
[email protected] 1
[email protected] 1
[email protected] 2
# instances by mail
df.pivot_table(index=['email address', 'instance ID'])


