Home > Mobile >  Selecting value from Pandas without going through .values[0]
Selecting value from Pandas without going through .values[0]

Time:02-04

An example dataset I'm working with

df = pd.DataFrame({"competitorname": ["3 Musketeers", "Almond Joy"], "winpercent": [67.602936, 50.347546] }, index = [1, 2])

I am trying to see whether 3 Musketeers or Almond Joy has a higher winpercent. The code I wrote is:

more_popular = '3 Musketeers' if df.loc[df["competitorname"] == '3 Musketeers', 'winpercent'].values[0] > df.loc[df["competitorname"] == 'Almond Joy', 'winpercent'].values[0] else 'Almond Joy'

My question is

Can I select the values I am interested in without python returning a Series? Is there a way to just do

df[df["competitorname"] == 'Almond Joy', 'winpercent']

and then it would return a simple

50.347546

?

I know this doesn't make my code significantly shorter but I feel like I am missing something about getting values from pandas that would help me avoid constantly adding

.values[0]

CodePudding user response:

How about simply sorting the dataframe by "winpercent" and then taking the top row?

df.sort_values(by="winpercent", ascending=False, inplace=True)

then to see the winner's row

df.head(1)

or to get the values

df.iloc[0]["winpercent"]

CodePudding user response:

If you're sure that the returned Series has a single element, you can simply use .item() to get it:

import pandas as pd
df = pd.DataFrame({
    "competitorname": ["3 Musketeers", "Almond Joy"], 
    "winpercent": [67.602936, 50.347546]
}, index = [1, 2])

s = df.loc[df["competitorname"] == 'Almond Joy', 'winpercent']  # a pandas Series
print(s)
# output
# 2    50.347546
# Name: winpercent, dtype: float64

v = df.loc[df["competitorname"] == 'Almond Joy', 'winpercent'].item()  # a scalar value
print(v)
# output
# 50.347546

CodePudding user response:

The underlying issue is that there could be multiple matches, so we will always need to extract the match(es) at some point in the pipeline:

  • Use Series.idxmax on the boolean mask

    Since False is 0 and True is 1, using Series.idxmax on the boolean mask will give you the index of the first True:

    df.loc[df['competitorname'].eq('Almond Joy').idxmax(), 'winpercent']
    # 50.347546
    

    This assumes there is at least 1 True match, otherwise it will return the first False.

  • Or use Series.item on the result

    This is basically just an alias for Series.values[0]:

    df.loc[df['competitorname'].eq('Almond Joy'), 'winpercent'].item()
    # 50.347546
    

    This assumes there is exactly 1 True match, otherwise it will throw a ValueError.

  •  Tags:  
  • Related