Home > Net >  Pandas: Get list of shared values of column B that two different values from column B have in common
Pandas: Get list of shared values of column B that two different values from column B have in common

Time:02-01

I have a table like this:

image user
1 1
2 1
3 1
1 2
3 2
2 3
3 3
... ...

Now I want to find shared images between two users. For example, when I ask for the images shared between user 1 and user 2, I want [1, 3] as a result.

How could I achieve this?

CodePudding user response:

You can do that with the following

seen = df.groupby("user")["image"].apply(set)
shared = list(seen[1].intersection(seen[2]))

print(shared)
[1,3]

CodePudding user response:

Expanding on @BrendanA's answer, you can use itertools.combinations to get all 2-combinations, find their shared images and cast the result to a DataFrame:

from itertools import combinations
users_to_images = df.groupby('user')['image'].agg(set)
data = {(i,j): [list(users_to_images[i].intersection(users_to_images[j]))] for i,j in combinations(users_to_images.index, 2)}
out = pd.DataFrame.from_records(data, index=['shared_images']).T

Output:

       shared_images
(1, 2)        [1, 3]
(1, 3)        [2, 3]
(2, 3)           [3]

Then users 1,2 share images 1,3; users 1,3 share 2,3, etc.

  •  Tags:  
  • Related