Home > Enterprise >  Pandas create a mask based on multiple thresholds
Pandas create a mask based on multiple thresholds

Time:01-19

Problem:

Lets say there is a Pandas Dataframe:

d = {'A': [0.1,0.4,0.2],'B':[0.7,0.3,0.2],'Z':[0.5,0.3,0.4],'sth':["a","b","c"]}
df = pd.DataFrame(data = d)
df
A B Z sth
0 0.1 0.7 0.5 "abc"
1 0.4 0.3 0.3 "something"
2 0.2 0.2 0.4 "unimportant"
thresholds = {'A': 0.5, 'B':0.8, 'Z': 0.3}

I want to find a mask that will have True for each row, where highest value among n classes is lower than threshold defined for this class.

For the given example, correct mask would be:

[ True, True, False ]

Explanation:

  1. Row 0 has highest value in column B of 0.7, and threshold for class B is 0.8, hence True
  2. Row 1 has highest value in column A of 0.4, and threshold for class A is 0.5, hence True
  3. Row 2 has highest value in column Z of 0.4, and threshold for class Z is 0.3, hence False

CodePudding user response:

[df[col].max < tresholds[col] for col in tresholds.keys()]

CodePudding user response:

You could solve it the simple way (if you don't necessarily need the dict):

df[["A", "B", "Z"]].max().le([0.5,0.8,0.3])

Output:

A     True
B     True
Z     False
dtype: bool
  •  Tags:  
  • Related