I am trying to determine whether or a given value in a row of a DataFrame is within two other columns from a separate DataFrame, or if that estimate is zero.
import pandas as pd
df = pd.DataFrame([[-1, 2, 1, 3], [4, 6, 7,8], [-2, 10, 11, 13], [5, 6, 8, 9]],
columns=['lo1', 'up1','lo2', 'up2'])
lo1 up1 lo2 up2
0 -1 2 1 3
1 4 6 7 8
2 -2 10 11 13
3 5 6 8 9
df2 = pd.DataFrame([[1, 3], [4, 6] , [5, 8], [10, 2,]],
columns=['pe1', 'pe2'])
pe1 pe2
0 1 3
1 4 6
2 5 8
3 10 2
To be more clear, is it possible to develop a for-loop or use a function that can look at pe1 and its corresponding values and determine if they are within lo1 and up1, if lo1 and up1 cross zero, and if pe1=0? I am having a hard time coding this in Python.
EDIT: I'd like the output to be something like:
m1 m2
0 0 3
1 4 0
2 0 0
3 0 0
Since the only pe that falls within its corresponding lo and up column are in the first row, second column, and second row, first column.
CodePudding user response:
You can eventually concatenate the two dataframes along the horizontal axis and then use np.where. This has a similar behaviour as where used by RJ Adriaansen.
import pandas as pd
import numpy as np
# Data
df1 = pd.DataFrame([[-1, 2, 1, 3], [4, 6, 7,8], [-2, 10, 11, 13], [5, 6, 8, 9]],
columns=['lo1', 'up1','lo2', 'up2'])
df2 = pd.DataFrame([[1, 3], [4, 6] , [5, 8], [10, 2,]],
columns=['pe1', 'pe2'])
# concatenate dfs
df = pd.concat([df1, df2], axis=1)
where now df looks like
lo1 up1 lo2 up2 pe1 pe2
0 -1 2 1 3 1 3
1 4 6 7 8 4 6
2 -2 10 11 13 5 8
3 5 6 8 9 10 2
Finally we use np.where and between
for k in [1, 2]:
df[f"m{k}"] = np.where(
(df[f"pe{k}"].between(df[f"lo{k}"], df[f"up{k}"]) &
df[f"lo{k}"].gt(0)),
df[f"pe{k}"],
0)
and the result is
lo1 up1 lo2 up2 pe1 pe2 m1 m2
0 -1 2 1 3 1 3 0 3
1 4 6 7 8 4 6 4 0
2 -2 10 11 13 5 8 0 0
3 5 6 8 9 10 2 0 0
CodePudding user response:
You can create a boolean mask for the required condition. For pe1 that would be:
- value in
lo1is smaller or equal tope1 - value in
up1is larger or equal tope1 - value in
lo1is larger than0
This would make this mask:
(df['lo1'] <= df2['pe1']) & (df['up1'] >= df2['pe1']) & (df['lo1'] > 0)
which returns:
0 False
1 True
2 False
3 False
dtype: bool
Now you can use where to keep the values that match True and replace those who don't with 0:
df2['pe1'] = df2['pe1'].where((df['lo1'] <= df2['pe1']) & (df['up1'] >= df2['pe1']) & (df['lo1'] > 0), other=0)
df2['pe2'] = df2['pe2'].where((df['lo2'] <= df2['pe2']) & (df['up2'] >= df2['pe2']) & (df['lo2'] > 0), other=0)
Result:
| pe1 | pe2 | |
|---|---|---|
| 0 | 0 | 3 |
| 1 | 4 | 0 |
| 2 | 0 | 0 |
| 3 | 0 | 0 |
To loop all columns:
for i in df2.columns:
nr = i[2:] #remove the first two characters to get the number, then use that number to match the columns in the other df
df2[i] = df2[i].where((df[f'lo{nr}'] <= df2[i]) & (df[f'up{nr}'] >= df2[i]) & (df[f'lo{nr}'] > 0), other=0)
