Home > Net >  Set column value to true/false depending on conditions from multiple columns
Set column value to true/false depending on conditions from multiple columns

Time:02-06

I have two data frames:

df1 =

date                    col1      col2      col3      col4      col5      col6      bool
----------------------------------------------------------------------------------------- 
2022-02-03 00:00:00     0.0       1.2       3.3       10        1000      40        False
2022-02-03 01:00:00     0.0       2.1       3.1       11        1500      50        False
2022-02-03 02:00:00     1.0       5.2       13.3      12        1600      60        False
2022-02-03 03:00:00     2.0       1.4       5.6       13        1700      70        False


df2 =

date                    col1      col3      col6
------------------------------------------------
2022-02-03 00:00:00     0.0       3.3       40
2022-02-03 03:00:00     2.0       5.6       70

So df2 only has some of the columns from df1.

What I would like to do is set the bool value of the bool column in df1 to True if ALL column values in a row in df2 are true in df1. So basically the resulting data frame should be:

df_final =

date                    col1      col2      col3      col4      col5      col6      bool
----------------------------------------------------------------------------------------- 
2022-02-03 00:00:00     0.0       1.2       3.3       10        1000      40        True
2022-02-03 01:00:00     0.0       2.1       3.1       11        1500      50        False
2022-02-03 02:00:00     1.0       5.2       13.3      12        1600      60        False
2022-02-03 03:00:00     2.0       1.4       5.6       13        1700      70        True

I was thinking I could probably just do some iterrows and loop through each row or something like that, but I actually have quite a lot of rows, so for performance I was hoping there might be a better option to achieve this ?

CodePudding user response:

In [82]: df = pd.merge(df1, df2, how="outer", indicator="bool")

In [83]: df
Out[83]: 
                  date  col1  col2  col3  col4  col5  col6       bool
0  2022-02-03 00:00:00   0.0   1.2   3.3    10  1000    40       both
1  2022-02-03 01:00:00   0.0   2.1   3.1    11  1500    50  left_only
2  2022-02-03 02:00:00   1.0   5.2  13.3    12  1600    60  left_only
3  2022-02-03 03:00:00   2.0   1.4   5.6    13  1700    70       both

In [84]: df["bool"] = df["bool"].map({"left_only": False, "both": True})

In [85]: df
Out[85]: 
                  date  col1  col2  col3  col4  col5  col6   bool
0  2022-02-03 00:00:00   0.0   1.2   3.3    10  1000    40   True
1  2022-02-03 01:00:00   0.0   2.1   3.1    11  1500    50  False
2  2022-02-03 02:00:00   1.0   5.2  13.3    12  1600    60  False
3  2022-02-03 03:00:00   2.0   1.4   5.6    13  1700    70   True
  •  Tags:  
  • Related