I have the following dataframe:
import pandas as pd
#Create DF
d = {
'Category': ['A','B','C','D','E','F','G'],
'Value':[10,20,30,40,50,60,70],
}
df = pd.DataFrame(data=d)
df
I then have a defined variable of a number that may change:
e.g val = 45
How do i search the dataframe column Value and return the index number of the row where val would fit.
Expected value would be:
for val = 45 the index i would like returned is 4
for val = 22 the index i would like returned is 2
for val = 60 the index i would like returned is 6 (would go after if it has a match)
Any help would be greatly appreciated!
CodePudding user response:
You can just use argmax
(df['Value'] > 45).argmax() # 4
(df['Value'] > 22).argmax() # 2
(df['Value'] > 60).argmax() # 6
This assumes 'Value' is sorted, but it works because the result of the comparison is a boolean array, so it is returning the index of the first True value.
Edit
To be more rigorous, we can support numbers that are greater than any value in the array:
tmp = (df['Value'] > 100)
index = tmp.argmax() if tmp.any() else len(df)
In this case, we correctly get 7 whereas using argmax alone returns 0.
If you hate the extra one line, it looks like you can use the walrus operator in Python 3.8 :
tmp.argmax() if (tmp := (df['Value'] > 100)).any() else len(df)
Correction
Thanks to @Bill, I can see that argmax is in fact deprecated, and idxmax is the correct way to go:
(df['Value'] > 45).idxmax() # 4
(df['Value'] > 22).idxmax() # 2
(df['Value'] > 60).idxmax() # 6
tmp.idxmax() if (tmp := (df['Value'] > 100)).any() else len(df) # 7
tmp.idxmax() if (tmp := (df['Value'] > 22)).any() else len(df) # 2
CodePudding user response:
There's a ton of ways to do this, but this also works exactly as you would like and is identical to using argmax.
val = 45
min(df.loc[(df['Value'] > val)].index)
CodePudding user response:
You can use pd.Series.searchsorted
df.Value.searchsorted([45,22,60], side='right')
Output
array([4, 2, 6])

