Determine if Values are within range based on pandas DataFrame column-CodePudding

I am trying to determine whether or a given value in a row of a DataFrame is within two other columns from a separate DataFrame, or if that estimate is zero.

import pandas as pd

df = pd.DataFrame([[-1, 2, 1, 3], [4, 6, 7,8], [-2, 10, 11, 13], [5, 6, 8, 9]],
                  columns=['lo1', 'up1','lo2', 'up2'])

   lo1  up1  lo2  up2
0   -1    2    1    3
1    4    6    7    8
2   -2   10   11   13
3    5    6    8    9

df2 = pd.DataFrame([[1, 3], [4, 6] , [5, 8], [10, 2,]],
                   columns=['pe1', 'pe2'])

   pe1  pe2
0    1    3
1    4    6
2    5    8
3   10    2

To be more clear, is it possible to develop a for-loop or use a function that can look at pe1 and its corresponding values and determine if they are within lo1 and up1, if lo1 and up1 cross zero, and if pe1=0? I am having a hard time coding this in Python.

EDIT: I'd like the output to be something like:

Since the only pe that falls within its corresponding lo and up column are in the first row, second column, and second row, first column.

CodePudding user response：

You can eventually concatenate the two dataframes along the horizontal axis and then use np.where. This has a similar behaviour as where used by RJ Adriaansen.

import pandas as pd
import numpy as np

# Data
df1 = pd.DataFrame([[-1, 2, 1, 3], [4, 6, 7,8], [-2, 10, 11, 13], [5, 6, 8, 9]],
                  columns=['lo1', 'up1','lo2', 'up2'])


df2 = pd.DataFrame([[1, 3], [4, 6] , [5, 8], [10, 2,]],
                   columns=['pe1', 'pe2'])

# concatenate dfs
df = pd.concat([df1, df2], axis=1)

where now df looks like

   lo1  up1  lo2  up2  pe1  pe2
0   -1    2    1    3    1    3
1    4    6    7    8    4    6
2   -2   10   11   13    5    8
3    5    6    8    9   10    2

Finally we use np.where and between

for k in [1, 2]:
    df[f"m{k}"] = np.where(
        (df[f"pe{k}"].between(df[f"lo{k}"], df[f"up{k}"]) & 
         df[f"lo{k}"].gt(0)),
        df[f"pe{k}"],
        0)

and the result is

   lo1  up1  lo2  up2  pe1  pe2  m1  m2
0   -1    2    1    3    1    3   0   3
1    4    6    7    8    4    6   4   0
2   -2   10   11   13    5    8   0   0
3    5    6    8    9   10    2   0   0

CodePudding user response：

You can create a boolean mask for the required condition. For pe1 that would be:

value in lo1 is smaller or equal to pe1
value in up1 is larger or equal to pe1
value in lo1 is larger than 0

This would make this mask:

(df['lo1'] <= df2['pe1'])  &  (df['up1'] >= df2['pe1']) & (df['lo1'] > 0)

which returns:

0    False
1     True
2    False
3    False
dtype: bool

Now you can use where to keep the values that match True and replace those who don't with 0:

df2['pe1'] = df2['pe1'].where((df['lo1'] <= df2['pe1']) & (df['up1'] >= df2['pe1']) & (df['lo1'] > 0), other=0)
df2['pe2'] = df2['pe2'].where((df['lo2'] <= df2['pe2']) & (df['up2'] >= df2['pe2']) & (df['lo2'] > 0), other=0)

Result:

	pe1	pe2
0	0	3
1	4	0
2	0	0
3	0	0

To loop all columns:

for i in df2.columns:
    nr = i[2:] #remove the first two characters to get the number, then use that number to match the columns in the other df
    df2[i] = df2[i].where((df[f'lo{nr}'] <= df2[i]) & (df[f'up{nr}'] >= df2[i]) & (df[f'lo{nr}'] > 0), other=0)