Home > Net >  Pandas - get first occurrence of a given column value
Pandas - get first occurrence of a given column value

Time:02-01

I have a df with repeated names for different rounds of a tournament, like so:

name   round_id price_open
John       1     5.0
Paul       1     4.0
John       2     5.4
Paul       2     3.4
John       3     5.0
Paul       3     4.0

But at round 3, a new player enters the tournament:

...
George     3     6.0
...

Lets say I need to filter down all starting prices, like so:

df_open = df[df['round_id']==1]['price_open']

This will get NaN for George, which is not what I need.


So how do I filter this df in order to get first opening prices for all players, ending up with?

name  price_open
John   5.0
Paul   4.0
George 6.0 

CodePudding user response:

Use drop_duplicates to keep the first instance of each name:

>>> df.drop_duplicates('name')
     name  round_id  price_open
0    John         1         5.0
1    Paul         1         4.0
6  George         3         6.0

CodePudding user response:

You can find the index of the first occurrence by each name present in your dataframe using Series.idxmax, then you can use iloc method to get the correct rows:

Using list-comprehension:

df.iloc[[df.name.eq(name).idxmax() for name in df['name'].value_counts().index]]

Without list-comprehension:

indexes = []
for name in df['name'].value_counts().index:
  indexes.append(df.name.eq(name).idxmax())
df.iloc[indexes]

Output:

name round_id price_open
1 Paul 1 4.0
0 John 1 5.0
6 George 3 6.0

[EDIT]: I have no idea about the parameters accepted by drop_duplicates method, it is better use it if it resolves your requirement :).

  •  Tags:  
  • Related