I have a df with repeated names for different rounds of a tournament, like so:
name round_id price_open
John 1 5.0
Paul 1 4.0
John 2 5.4
Paul 2 3.4
John 3 5.0
Paul 3 4.0
But at round 3, a new player enters the tournament:
...
George 3 6.0
...
Lets say I need to filter down all starting prices, like so:
df_open = df[df['round_id']==1]['price_open']
This will get NaN for George, which is not what I need.
So how do I filter this df in order to get first opening prices for all players, ending up with?
name price_open
John 5.0
Paul 4.0
George 6.0
CodePudding user response:
Use drop_duplicates to keep the first instance of each name:
>>> df.drop_duplicates('name')
name round_id price_open
0 John 1 5.0
1 Paul 1 4.0
6 George 3 6.0
CodePudding user response:
You can find the index of the first occurrence by each name present in your dataframe using Series.idxmax, then you can use iloc method to get the correct rows:
Using list-comprehension:
df.iloc[[df.name.eq(name).idxmax() for name in df['name'].value_counts().index]]
Without list-comprehension:
indexes = []
for name in df['name'].value_counts().index:
indexes.append(df.name.eq(name).idxmax())
df.iloc[indexes]
Output:
| name | round_id | price_open | |
|---|---|---|---|
| 1 | Paul | 1 | 4.0 |
| 0 | John | 1 | 5.0 |
| 6 | George | 3 | 6.0 |
[EDIT]: I have no idea about the parameters accepted by drop_duplicates method, it is better use it if it resolves your requirement :).
