I have a table like this:
| image | user |
|---|---|
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 1 | 2 |
| 3 | 2 |
| 2 | 3 |
| 3 | 3 |
| ... | ... |
Now I want to find shared images between two users. For example, when I ask for the images shared between user 1 and user 2, I want [1, 3] as a result.
How could I achieve this?
CodePudding user response:
You can do that with the following
seen = df.groupby("user")["image"].apply(set)
shared = list(seen[1].intersection(seen[2]))
print(shared)
[1,3]
CodePudding user response:
Expanding on @BrendanA's answer, you can use itertools.combinations to get all 2-combinations, find their shared images and cast the result to a DataFrame:
from itertools import combinations
users_to_images = df.groupby('user')['image'].agg(set)
data = {(i,j): [list(users_to_images[i].intersection(users_to_images[j]))] for i,j in combinations(users_to_images.index, 2)}
out = pd.DataFrame.from_records(data, index=['shared_images']).T
Output:
shared_images
(1, 2) [1, 3]
(1, 3) [2, 3]
(2, 3) [3]
Then users 1,2 share images 1,3; users 1,3 share 2,3, etc.
