Pandas - get first occurrence of a given column value-CodePudding

I have a df with repeated names for different rounds of a tournament, like so:

name   round_id price_open
John       1     5.0
Paul       1     4.0
John       2     5.4
Paul       2     3.4
John       3     5.0
Paul       3     4.0

But at round 3, a new player enters the tournament:

...
George     3     6.0
...

Lets say I need to filter down all starting prices, like so:

df_open = df[df['round_id']==1]['price_open']

This will get NaN for George, which is not what I need.

So how do I filter this df in order to get first opening prices for all players, ending up with?

name  price_open
John   5.0
Paul   4.0
George 6.0

CodePudding user response：

Use drop_duplicates to keep the first instance of each name:

>>> df.drop_duplicates('name')
     name  round_id  price_open
0    John         1         5.0
1    Paul         1         4.0
6  George         3         6.0

CodePudding user response：

You can find the index of the first occurrence by each name present in your dataframe using Series.idxmax, then you can use iloc method to get the correct rows:

Using list-comprehension:

df.iloc[[df.name.eq(name).idxmax() for name in df['name'].value_counts().index]]

Without list-comprehension:

indexes = []
for name in df['name'].value_counts().index:
  indexes.append(df.name.eq(name).idxmax())
df.iloc[indexes]

Output:

	name	round_id	price_open
1	Paul	1	4.0
0	John	1	5.0
6	George	3	6.0

[EDIT]: I have no idea about the parameters accepted by drop_duplicates method, it is better use it if it resolves your requirement :).