I have two data frames:
df1 =
val1 val2
date
2020-01-01 42.353719 34.823934
2020-01-02 42.255172 34.683523
2020-01-03 46.049281 37.036076
2020-01-04 41.026087 36.340164
2020-01-05 32.300000 18.835484
df2 =
val3 val4
date
2020-01-01 01:00:00 01:00 4021.28 570.06
2020-01-01 02:00:00 01:00 4149.82 842.25
2020-01-01 03:00:00 01:00 4307.65 1638.57
2020-01-02 04:00:00 01:00 4428.95 2109.98
2020-01-02 05:00:00 01:00 4542.47 2365.25
2020-01-02 05:00:00 01:00 4462.47 2325.25
...
I would then like the val1 and val2 from df1 to be inserted for all rows with the same date (they don't have the same date format of course) in df2, i.e.:
df_final =
val1 val2 val3 val4
date
2020-01-01 01:00:00 01:00 42.353719 34.823934 4021.28 570.06
2020-01-01 02:00:00 01:00 42.353719 34.823934 4149.82 842.25
2020-01-01 03:00:00 01:00 42.353719 34.823934 4307.65 1638.57
2020-01-02 04:00:00 01:00 42.255172 34.683523 4428.95 2109.98
2020-01-02 05:00:00 01:00 42.255172 34.683523 4542.47 2365.25
2020-01-02 05:00:00 01:00 42.255172 34.683523 4462.47 2325.25
How can this be done?
CodePudding user response:
I don't know if it is straightforward but here is a solution:
import pandas as pd
import datetime
df1.index = pd.to_datetime(df1.index)
df1["Date_merge"] = df1.index.date
df2.index = pd.to_datetime(df2.index)
df2["Date_merge"] = df2.index.date
df_final = (
df1.merge(df2, on="Date_merge")
.set_index(df2.index)
.rename_axis("date")
.drop("Date_merge", axis=1)
)
print(df_final)
val1 val2 val3 val4
date
2020-01-01 01:00:00 01:00 42.353719 34.823934 4021.28 570.06
2020-01-01 02:00:00 01:00 42.353719 34.823934 4149.82 842.25
2020-01-01 03:00:00 01:00 42.353719 34.823934 4307.65 1638.57
2020-01-02 04:00:00 01:00 42.255172 34.683523 4428.95 2109.98
2020-01-02 05:00:00 01:00 42.255172 34.683523 4462.47 2325.25
If someone wants to reconstruct the data, here df1 & df2 I made by the given tables:
df1 = pd.DataFrame(
{
"val1": {
"2020-01-01": 42.353719,
"2020-01-02": 42.255172,
"2020-01-03": 46.049281,
"2020-01-04": 41.026087,
"2020-01-05": 32.3,
},
"val2": {
"2020-01-01": 34.823934,
"2020-01-02": 34.683523,
"2020-01-03": 37.036076,
"2020-01-04": 36.340164,
"2020-01-05": 18.835484,
},
}
)
df2 = pd.DataFrame(
{
"val3": {
"2020-01-01 01:00:00 01:00": 4021.28,
"2020-01-01 02:00:00 01:00": 4149.82,
"2020-01-01 03:00:00 01:00": 4307.65,
"2020-01-02 04:00:00 01:00": 4428.95,
"2020-01-02 05:00:00 01:00": 4462.47,
},
"val4": {
"2020-01-01 01:00:00 01:00": 570.06,
"2020-01-01 02:00:00 01:00": 842.25,
"2020-01-01 03:00:00 01:00": 1638.57,
"2020-01-02 04:00:00 01:00": 2109.98,
"2020-01-02 05:00:00 01:00": 2325.25,
},
}
)
CodePudding user response:
Here is my suggestion. You create a new helper column called 'datem' which is the date without the time. Then use this column for the merge process.
See the script.
import pandas as pd
import io
df1="""date val1 val2
2020-01-01 42.353719 34.823934
2020-01-02 42.255172 34.683523
2020-01-03 46.049281 37.036076
2020-01-04 41.026087 36.340164
2020-01-05 32.300000 18.835484
"""
df2="""date val3 val4
2020-01-01 01:00:00 01:00 4021.28 570.06
2020-01-01 02:00:00 01:00 4149.82 842.25
2020-01-01 03:00:00 01:00 4307.65 1638.57
2020-01-02 04:00:00 01:00 4428.95 2109.98
2020-01-02 05:00:00 01:00 4542.47 2365.25
2020-01-02 05:00:00 01:00 4462.47 2325.25
"""
df1 = pd.read_csv(io.StringIO(df1), sep=" ",engine='python')
df1['date'] = pd.to_datetime(df1['date'], format='%Y-%m-%d')
df2 = pd.read_csv(io.StringIO(df2), sep=" ",engine='python')
df2['date'] = df2['date'].astype('datetime64[ns]')
df2['datem'] = df2['date'].dt.floor('d') ##helper column for the merge.
df1.merge(df2,left_on=['date'],right_on=['datem'])
Result is here:
date_x val1 val2 date_y val3 val4 datem
0 2020-01-01 42.353719 34.823934 2020-01-01 00:00:00 4021.28 570.06 2020-01-01
1 2020-01-01 42.353719 34.823934 2020-01-01 01:00:00 4149.82 842.25 2020-01-01
2 2020-01-01 42.353719 34.823934 2020-01-01 02:00:00 4307.65 1638.57 2020-01-01
3 2020-01-02 42.255172 34.683523 2020-01-02 03:00:00 4428.95 2109.98 2020-01-02
4 2020-01-02 42.255172 34.683523 2020-01-02 04:00:00 4542.47 2365.25 2020-01-02
5 2020-01-02 42.255172 34.683523 2020-01-02 04:00:00 4462.47 2325.25 2020-01-02
Let me know what you think.
