I have this dataframe A:
timestamp value_A
-------------------------
5/3/16 8:00 64
5/3/16 9:00 74
5/3/16 10:00 54
5/3/16 11:00 34
5/3/16 12:00 26
5/3/16 13:00 42
5/3/16 14:00 44
5/3/16 15:00 14
5/3/16 16:00 65
5/3/16 19:00 36
5/3/16 20:00 32
5/3/16 23:00 32
...
I want to merge this dataframe with this dataframe B:
Month value_B
-----------------------
05-03 08:00 35
05-03 09:00 44
05-03 10:00 22
05-03 11:00 25
05-03 12:00 75
05-03 13:00 64
05-03 14:00 43
05-03 15:00 44
05-03 16:00 26
05-03 17:00 22
05-03 18:00 35
05-03 19:00 36
05-03 20:00 32
05-03 21:00 26
05-03 22:00 44
05-03 23:00 22
...
I want to merge the dataframe based on matching day/month/hour/minute timestamps, with no year.
Notice that there are skipped timestamp entries in dataframe A, and so dataframe B has more rows. What I want to do is match up the rows by timestamp and create a new dataframe with columns for both value_A and value_B, forgetting the value_B rows in dataframe_B that do not have a corresponding timestamp with dataframe A. And so here is the dataframe I want to produce dataframe C:
timestamp value_A value_B
--------------------------------------
5/3/16 8:00 64 35
5/3/16 9:00 74 44
5/3/16 10:00 54 22
5/3/16 11:00 34 25
5/3/16 12:00 26 75
5/3/16 13:00 42 64
5/3/16 14:00 44 43
5/3/16 15:00 14 44
5/3/16 16:00 65 26
5/3/16 19:00 36 36
5/3/16 20:00 32 32
5/3/16 23:00 32 22
I would like the value_B values to match all timestamps on the corresponding month and day, regardless of the year. By this I mean that a value of let's say 55 in dataframe B at let's say 5/25, should be matched to timestamps of both 5/25/16 and also 5/25/17.
I am having a lot of trouble with this, because it seems no matter what I try, when I send the final merged dataframe to a csv and then look at it in Excel, I keep seeing the year attached, which means means that for the given above example, 5/25/16 would be matched to 55, but 5/25/17 would have no value. How can I go about accomplishing this in a way that distributes that matches value_B values to dataframe A timestamps, based on just the month/day/hour/minute timestamp?
CodePudding user response:
First, you will have to get both dataframes in same format.
For df1:
df1["timestamp"] = pd.to_datetime(df1["timestamp"])
df1["timestamp"] = df1["timestamp"].dt.strftime("%d-%m %H:%M")
For df2:
df2["Month"] = pd.to_datetime(df2["Month"], format="%d-%m %H:%M")
df2["Month"] = df2["Month"].dt.strftime("%m-%d %H:%M")
Then, we will merge it into a single dataframe and drop one column from timestamp or Month.
df3 = pd.merge(df1, df2, left_on="timestamp", right_on="Month")
df3.drop(['Month'], axis=1, inplace=True)
This will produce the same output you are looking for!
CodePudding user response:
First convert column timestamp to datetimes and then create new column Month with same format like :
df1['Month'] = pd.to_datetime(df1['timestamp']).dt.strftime('%m-%d %H:%M')
And then use left join with second DataFrame with remove Month column:
df = df1.merge(df2, on='Month', how='left').drop('Month', axis=1)
print (df)
timestamp value_A value_B
0 5/3/16 8:00 64 35
1 5/3/16 9:00 74 44
2 5/3/16 10:00 54 22
3 5/3/16 11:00 34 25
4 5/3/16 12:00 26 75
5 5/3/16 13:00 42 64
6 5/3/16 14:00 44 43
7 5/3/16 15:00 14 44
8 5/3/16 16:00 65 26
9 5/3/16 19:00 36 36
10 5/3/16 20:00 32 32
11 5/3/16 23:00 32 22
