Unable to math values based on timestamps due to differing years in python-CodePudding

I have this dataframe A:

   timestamp     value_A
-------------------------
 5/3/16 8:00          64
 5/3/16 9:00          74
5/3/16 10:00          54
5/3/16 11:00          34
5/3/16 12:00          26
5/3/16 13:00          42
5/3/16 14:00          44
5/3/16 15:00          14
5/3/16 16:00          65
5/3/16 19:00          36
5/3/16 20:00          32
5/3/16 23:00          32
...

I want to merge this dataframe with this dataframe B:

     Month     value_B
-----------------------
05-03 08:00         35
05-03 09:00         44
05-03 10:00         22
05-03 11:00         25
05-03 12:00         75 
05-03 13:00         64
05-03 14:00         43
05-03 15:00         44
05-03 16:00         26
05-03 17:00         22 
05-03 18:00         35 
05-03 19:00         36 
05-03 20:00         32
05-03 21:00         26
05-03 22:00         44 
05-03 23:00         22 
...

I want to merge the dataframe based on matching day/month/hour/minute timestamps, with no year.

Notice that there are skipped timestamp entries in dataframe A, and so dataframe B has more rows. What I want to do is match up the rows by timestamp and create a new dataframe with columns for both value_A and value_B, forgetting the value_B rows in dataframe_B that do not have a corresponding timestamp with dataframe A. And so here is the dataframe I want to produce dataframe C:

   timestamp     value_A      value_B
--------------------------------------  
 5/3/16 8:00          64           35    
 5/3/16 9:00          74           44
5/3/16 10:00          54           22
5/3/16 11:00          34           25
5/3/16 12:00          26           75
5/3/16 13:00          42           64
5/3/16 14:00          44           43
5/3/16 15:00          14           44
5/3/16 16:00          65           26
5/3/16 19:00          36           36
5/3/16 20:00          32           32
5/3/16 23:00          32           22

I would like the value_B values to match all timestamps on the corresponding month and day, regardless of the year. By this I mean that a value of let's say 55 in dataframe B at let's say 5/25, should be matched to timestamps of both 5/25/16 and also 5/25/17.

I am having a lot of trouble with this, because it seems no matter what I try, when I send the final merged dataframe to a csv and then look at it in Excel, I keep seeing the year attached, which means means that for the given above example, 5/25/16 would be matched to 55, but 5/25/17 would have no value. How can I go about accomplishing this in a way that distributes that matches value_B values to dataframe A timestamps, based on just the month/day/hour/minute timestamp?

CodePudding user response：

First, you will have to get both dataframes in same format.

For df1:

df1["timestamp"] = pd.to_datetime(df1["timestamp"])
df1["timestamp"] = df1["timestamp"].dt.strftime("%d-%m %H:%M")

For df2:

df2["Month"] = pd.to_datetime(df2["Month"], format="%d-%m %H:%M")
df2["Month"] = df2["Month"].dt.strftime("%m-%d %H:%M")

Then, we will merge it into a single dataframe and drop one column from timestamp or Month.

df3 = pd.merge(df1, df2, left_on="timestamp", right_on="Month")
df3.drop(['Month'], axis=1, inplace=True)

This will produce the same output you are looking for!

CodePudding user response：

First convert column timestamp to datetimes and then create new column Month with same format like :

df1['Month'] = pd.to_datetime(df1['timestamp']).dt.strftime('%m-%d %H:%M')

And then use left join with second DataFrame with remove Month column:

df = df1.merge(df2, on='Month', how='left').drop('Month', axis=1)

print (df)
       timestamp  value_A  value_B
0    5/3/16 8:00       64       35
1    5/3/16 9:00       74       44
2   5/3/16 10:00       54       22
3   5/3/16 11:00       34       25
4   5/3/16 12:00       26       75
5   5/3/16 13:00       42       64
6   5/3/16 14:00       44       43
7   5/3/16 15:00       14       44
8   5/3/16 16:00       65       26
9   5/3/16 19:00       36       36
10  5/3/16 20:00       32       32
11  5/3/16 23:00       32       22