Home > database >  Determine if Values are within range based on pandas DataFrame column
Determine if Values are within range based on pandas DataFrame column

Time:01-04

I am trying to determine whether or a given value in a row of a DataFrame is within two other columns from a separate DataFrame, or if that estimate is zero.

import pandas as pd

df = pd.DataFrame([[-1, 2, 1, 3], [4, 6, 7,8], [-2, 10, 11, 13], [5, 6, 8, 9]],
                  columns=['lo1', 'up1','lo2', 'up2'])

   lo1  up1  lo2  up2
0   -1    2    1    3
1    4    6    7    8
2   -2   10   11   13
3    5    6    8    9

df2 = pd.DataFrame([[1, 3], [4, 6] , [5, 8], [10, 2,]],
                   columns=['pe1', 'pe2'])

   pe1  pe2
0    1    3
1    4    6
2    5    8
3   10    2

To be more clear, is it possible to develop a for-loop or use a function that can look at pe1 and its corresponding values and determine if they are within lo1 and up1, if lo1 and up1 cross zero, and if pe1=0? I am having a hard time coding this in Python.

EDIT: I'd like the output to be something like:

   m1  m2
0   0   3
1   4   0
2   0   0
3   0   0

Since the only pe that falls within its corresponding lo and up column are in the first row, second column, and second row, first column.

CodePudding user response:

You can eventually concatenate the two dataframes along the horizontal axis and then use np.where. This has a similar behaviour as where used by RJ Adriaansen.

import pandas as pd
import numpy as np

# Data
df1 = pd.DataFrame([[-1, 2, 1, 3], [4, 6, 7,8], [-2, 10, 11, 13], [5, 6, 8, 9]],
                  columns=['lo1', 'up1','lo2', 'up2'])


df2 = pd.DataFrame([[1, 3], [4, 6] , [5, 8], [10, 2,]],
                   columns=['pe1', 'pe2'])

# concatenate dfs
df = pd.concat([df1, df2], axis=1)

where now df looks like

   lo1  up1  lo2  up2  pe1  pe2
0   -1    2    1    3    1    3
1    4    6    7    8    4    6
2   -2   10   11   13    5    8
3    5    6    8    9   10    2

Finally we use np.where and between

for k in [1, 2]:
    df[f"m{k}"] = np.where(
        (df[f"pe{k}"].between(df[f"lo{k}"], df[f"up{k}"]) & 
         df[f"lo{k}"].gt(0)),
        df[f"pe{k}"],
        0)

and the result is

   lo1  up1  lo2  up2  pe1  pe2  m1  m2
0   -1    2    1    3    1    3   0   3
1    4    6    7    8    4    6   4   0
2   -2   10   11   13    5    8   0   0
3    5    6    8    9   10    2   0   0

CodePudding user response:

You can create a boolean mask for the required condition. For pe1 that would be:

  • value in lo1 is smaller or equal to pe1
  • value in up1 is larger or equal to pe1
  • value in lo1 is larger than 0

This would make this mask:

(df['lo1'] <= df2['pe1'])  &  (df['up1'] >= df2['pe1']) & (df['lo1'] > 0)

which returns:

0    False
1     True
2    False
3    False
dtype: bool

Now you can use where to keep the values that match True and replace those who don't with 0:

df2['pe1'] = df2['pe1'].where((df['lo1'] <= df2['pe1']) & (df['up1'] >= df2['pe1']) & (df['lo1'] > 0), other=0)
df2['pe2'] = df2['pe2'].where((df['lo2'] <= df2['pe2']) & (df['up2'] >= df2['pe2']) & (df['lo2'] > 0), other=0)

Result:

pe1 pe2
0 0 3
1 4 0
2 0 0
3 0 0

To loop all columns:

for i in df2.columns:
    nr = i[2:] #remove the first two characters to get the number, then use that number to match the columns in the other df
    df2[i] = df2[i].where((df[f'lo{nr}'] <= df2[i]) & (df[f'up{nr}'] >= df2[i]) & (df[f'lo{nr}'] > 0), other=0)
  •  Tags:  
  • Related