I'm trying to compare two columns r1 and r2 in the pandas dataframe by row, essentially I want to return the 'red' when value in r1 is greater than that of r2 in the same row, while the outcome is that the comparison only returns yellow in the else condition even two values are provided.
Below is the dataset for reproducing the issue.
import pandas as pd
import numpy as np
from numpy import nan
time_idx = pd.date_range('2021-01-05', periods=8, freq='3H')
r1 = np.concatenate([[nan,nan,nan],235,324,125,[nan,nan]], axis=None)
r2 = np.concatenate([114,nan,345,235,nan,125,178,324], axis=None)
df_r = pd.DataFrame({'r1': r1,'r2':r2},index = time_idx)
df_r["Light"]=['red' if df_r.loc[t,'r1'] > df_r.loc[t,'r2']
else 'yellow' for t in df_r.index]
print(df_r)
And here are the outputs:
r1 r2 Light
2021-01-05 00:00:00 NaN 114.0 yellow
2021-01-05 03:00:00 NaN NaN yellow
2021-01-05 06:00:00 NaN 345.0 yellow
2021-01-05 09:00:00 235.0 235.0 yellow
2021-01-05 12:00:00 324.0 NaN yellow
2021-01-05 15:00:00 125.0 125.0 yellow
2021-01-05 18:00:00 NaN 178.0 yellow
2021-01-05 21:00:00 NaN 324.0 yellow
My question goes to as follows:
- How can I achieve the goal intended while sticking at using list comprehension?
Thank you.
CodePudding user response:
To have red where r1==324, replace np.nan by -np.inf:
df_r['Light'] = df_r['r1'].gt(df_r['r2'].fillna(-np.inf)) \
.replace({True: 'red', False: 'yellow'})
print(df)
# Output:
r1 r2 Light
2021-01-05 00:00:00 NaN 114.0 yellow
2021-01-05 03:00:00 NaN NaN yellow
2021-01-05 06:00:00 NaN 345.0 yellow
2021-01-05 09:00:00 235.0 235.0 yellow
2021-01-05 12:00:00 324.0 NaN red
2021-01-05 15:00:00 125.0 125.0 yellow
2021-01-05 18:00:00 NaN 178.0 yellow
2021-01-05 21:00:00 NaN 324.0 yellow
.gt is equivalent to the operator >.
