I want to filter group with specific values. Take the sample data below:
df = pd.DataFrame({'comp': ['a','b','a','b','c','d', 'c', 'c', 'd'], 'grid': [1,2,2,4,1,2,1,2,2]})
The df:
comp grid val
0 a 1 10
1 b 2 11
2 a 2 12
3 b 4 13
4 c 1 14
5 d 2 15
6 c 1 16
7 c 2 17
8 d 2 18
I want to groupby comp and check if in each group the grid has at least one value 1 and one value 2. The expect outcome is group a and group c
comp grid val
0 a 1 10
2 a 2 12
4 c 1 14
6 c 1 16
7 c 2 17
I try with groupby and transform but not successful
CodePudding user response:
You can use GroupBy.transform with compare sets:
#if contains 1,2 and some another values
df = df[df.groupby('comp')['grid'].transform(lambda x: set(x) >= set([1,2]))]
print (df)
comp grid
0 a 1
2 a 2
4 c 1
6 c 1
7 c 2
#if contains only 1,2
df = df[df.groupby('comp')['grid'].transform(lambda x: set(x) == set([1,2]))]
print (df)
comp grid
0 a 1
2 a 2
4 c 1
6 c 1
7 c 2
If need at least one 1 and one 2 is possible use:
intersect = np.intersect1d(df.loc[df['grid'].eq(1),'comp'], df.loc[df['grid'].eq(2),'comp'])
print (intersect)
['a' 'c']
df = df[df['comp'].isin(intersect)]
print (df)
comp grid
0 a 1
2 a 2
4 c 1
6 c 1
7 c 2
