I have a pandas DataFrame containing baseball fielding statistics. Each row shows how many games a player has appeared at a given position over the course of his career.
| Player | Position | Games |
|---|---|---|
| Brock Holt | 2B | 20 |
| Brock Holt | 3B | 70 |
| Ben Zobrist | OF | 100 |
| Ben Zobrist | 2B | 15 |
I want to be able to return one row per player containing the position they played the most over the years. In the example above the resulting DataFrame would be:
| Player | Position |
|---|---|
| Brock Holt | 3B |
| Ben Zobrist | OF |
I've been attempting to group by Player and aggregate by Games like so
df.groupby('Player').agg({'Games':['max']})
which simply provides the name of the player and the maximum number of games played at a position and
df.groupby('Player')['Position'].agg({'Games':['max']})
which returns the error "nested renamer is not supported."
CodePudding user response:
You could transform max and use eq to create a boolean mask where True corresponds to the most played Games row for each player. Then filter df:
msk = df.groupby('Player')['Games'].transform('max').eq(df['Games'])
out = df[msk].drop(columns='Games')
Output:
Player Position
1 Brock Holt 3B
2 Ben Zobrist OF
CodePudding user response:
You can also just use groupby and max directly as below:
df.groupby(['Player']).max()
Yields:
Position Games
Player
Ben Zobrist OF 100
Brock Holt 3B 70
