Let's say I have a dataframe as follows:
Group | Source | Name
___________________________
A | X | Jolly
A | X | Stone
A | X | Jolly
A | Y | Sand
B | X | Sand
B | X | Stone
B | Y | Stone
C | X | Sand
C | X | Stone
I want to find all Groups where each group of Sources share no common Names. Essentially in the example above, I want Group A as all groupings of Source (X and Y) Names share no common values. For this example we can assume there will only be 2 Sources (X and Y) and not all Groups have more than 1 Source. I am only interested in Groups with both Sources X and Y and no Name intersection.
The resulting DataFrame should look like this:
Group | Source | Name
___________________________
A | X | Jolly
A | X | Stone
A | X | Jolly
A | Y | Sand
I have tried, doing a grouby on Group then supplied a function to the chained filter method like so:
def find_no_intersection(df):
return (
len(df[df.Source == 'X'].Name.values) > 0 and
len(df[df.Source == 'Y'].Name.values) > 0 and
(
len(
set(df[df.Source == 'X'].Name.values) &
set(df[df.Source == 'Y'].Name.values)
) == 0
)
)
df.groupby(['Group']).filter(find_no_intersection)
Is this the right way? Is there a better way?
CodePudding user response:
Here is a way using nunique()
df.loc[df.groupby('Group')['Name'].transform(lambda x: x.size == x.nunique())]
Output:
Group Source Name
0 A X Jolly
1 A X Stone
2 A Y Sand
Update to answer:
(df.loc[
df['Group'].map(
df.groupby(['Group','Source'])['Name']
.agg(set)
.groupby(level=0)
.agg(lambda x: len(set.intersection(*x))==0))
])
Output:
Group Source Name
0 A X Jolly
1 A X Stone
2 A X Jolly
3 A Y Sand
CodePudding user response:
If I understand correctly, you can do that with the following.
df[~df['Group'].isin(df[df[['Source','Name']].duplicated()]['Group'])]
