I have two data frames:
df1 =
date col1 col2 col3 col4 col5 col6 bool
-----------------------------------------------------------------------------------------
2022-02-03 00:00:00 0.0 1.2 3.3 10 1000 40 False
2022-02-03 01:00:00 0.0 2.1 3.1 11 1500 50 False
2022-02-03 02:00:00 1.0 5.2 13.3 12 1600 60 False
2022-02-03 03:00:00 2.0 1.4 5.6 13 1700 70 False
df2 =
date col1 col3 col6
------------------------------------------------
2022-02-03 00:00:00 0.0 3.3 40
2022-02-03 03:00:00 2.0 5.6 70
So df2 only has some of the columns from df1.
What I would like to do is set the bool value of the bool column in df1 to True if ALL column values in a row in df2 are true in df1. So basically the resulting data frame should be:
df_final =
date col1 col2 col3 col4 col5 col6 bool
-----------------------------------------------------------------------------------------
2022-02-03 00:00:00 0.0 1.2 3.3 10 1000 40 True
2022-02-03 01:00:00 0.0 2.1 3.1 11 1500 50 False
2022-02-03 02:00:00 1.0 5.2 13.3 12 1600 60 False
2022-02-03 03:00:00 2.0 1.4 5.6 13 1700 70 True
I was thinking I could probably just do some iterrows and loop through each row or something like that, but I actually have quite a lot of rows, so for performance I was hoping there might be a better option to achieve this ?
CodePudding user response:
In [82]: df = pd.merge(df1, df2, how="outer", indicator="bool")
In [83]: df
Out[83]:
date col1 col2 col3 col4 col5 col6 bool
0 2022-02-03 00:00:00 0.0 1.2 3.3 10 1000 40 both
1 2022-02-03 01:00:00 0.0 2.1 3.1 11 1500 50 left_only
2 2022-02-03 02:00:00 1.0 5.2 13.3 12 1600 60 left_only
3 2022-02-03 03:00:00 2.0 1.4 5.6 13 1700 70 both
In [84]: df["bool"] = df["bool"].map({"left_only": False, "both": True})
In [85]: df
Out[85]:
date col1 col2 col3 col4 col5 col6 bool
0 2022-02-03 00:00:00 0.0 1.2 3.3 10 1000 40 True
1 2022-02-03 01:00:00 0.0 2.1 3.1 11 1500 50 False
2 2022-02-03 02:00:00 1.0 5.2 13.3 12 1600 60 False
3 2022-02-03 03:00:00 2.0 1.4 5.6 13 1700 70 True
