Suppose now I have some group data like
| GroupID | ID | Rank | target |
|---|---|---|---|
| A | 1 | 1 | 0 |
| A | 2 | 3 | 0 |
| A | 3 | 2 | 1 |
| B | 1 | 1 | 0 |
| B | 2 | 4 | 0 |
| B | 3 | 3 | 1 |
| B | 4 | 2 | 0 |
| C | 1 | 1 | 1 |
| C | 2 | 4 | 0 |
| C | 3 | 3 | 1 |
| C | 4 | 2 | 0 |
| D | 1 | 1 | 0 |
| D | 2 | 4 | 0 |
| D | 3 | 3 | 0 |
| D | 4 | 2 | 0 |
For each group,
I want to filter the group which has no rows which target=1.
Then I want to keep the row which target==1 and the rows which rank is higher than it. Some group may have many rows which target==1, and we choose the one which rank is lower as our target. For example for group C, the ID=1 and ID=3 all have target==1, we will keep the rows which the rank<=3. So we will get
| GroupID | ID | Rank | target |
|---|---|---|---|
| A | 1 | 1 | 0 |
| A | 3 | 2 | 1 |
| B | 1 | 1 | 0 |
| B | 3 | 3 | 1 |
| B | 4 | 2 | 0 |
| C | 1 | 1 | 1 |
| C | 3 | 3 | 1 |
| C | 4 | 2 | 0 |
CodePudding user response:
Replace Rank in Series.where if target is not 1 and then use GroupBy.transform for maximal Rank per group, so possible compare Rank column in boolean indexing by Series.le for less or equal:
s = df['Rank'].where(df['target'].eq(1)).groupby(df['GroupID']).transform('max')
df = df[df['Rank'].le(s)]
print (df)
GroupID ID Rank target
0 A 1 1 0
2 A 3 2 1
3 B 1 1 0
5 B 3 3 1
6 B 4 2 0
7 C 1 1 1
9 C 3 3 1
10 C 4 2 0
Details:
print (df['Rank'].where(df['target'].eq(1)))
0 NaN
1 NaN
2 2.0
3 NaN
4 NaN
5 3.0
6 NaN
7 1.0
8 NaN
9 3.0
10 NaN
Name: Rank, dtype: float64
print (s)
0 2.0
1 2.0
2 2.0
3 3.0
4 3.0
5 3.0
6 3.0
7 3.0
8 3.0
9 3.0
10 3.0
Name: Rank, dtype: float64
CodePudding user response:
IIUC, make a first pass to slice the rows with target == 1 (using eq), then get the max rank per group using GroupBy.max and select the rows with this maximum rank per group with classical boolean indexing using le:
thresh = df[df['target'].eq(1)].groupby('GroupID')['Rank'].max()
out = df[df['Rank'].le(df['GroupID'].map(thresh))]
output:
GroupID ID Rank target
0 A 1 1 0
2 A 3 2 1
3 B 1 1 0
5 B 3 3 1
6 B 4 2 0
7 C 1 1 1
9 C 3 3 1
10 C 4 2 0
thresholds:
>>> thresh
GroupID
A 2
B 3
C 3
