Home > Mobile >  Trying to match rows based on matching timestamps in python
Trying to match rows based on matching timestamps in python

Time:01-12

so I am trying to understand how I can match up the values of two columns based on matching timestamps when the timestamps are not in the same format. I believe this has to do with getting both columns into the same date-time object, but one of the columns has the time stamps components in multiple columns, so I am a bit confused here. I am working in python. I will explain further here.

I have this timestamped data, we will call dataframe A:

   timestamp     value_A
-------------------------
 5/3/16 8:00          64
 5/3/16 9:00          74
5/3/16 10:00          54
5/3/16 11:00          34
5/3/16 12:00          26
5/3/16 13:00          42
5/3/16 14:00          44
5/3/16 15:00          14
5/3/16 16:00          65
5/3/16 19:00          36
5/3/16 20:00          32
5/3/16 23:00          32
...

Now, here is timestamped dataframe B:

Month   Day   Hour    value_B
------------------------------
    5     3      8         35
    5     3      9         44
    5     3     10         22
    5     3     11         25
    5     3     12         75 
    5     3     13         64
    5     3     14         43
    5     3     15         44
    5     3     16         26
    5     3     17         22 
    5     3     18         35 
    5     3     19         36 
    5     3     20         32
    5     3     21         26
    5     3     22         44 
    5     3     23         22 
...

constructors:

df1 = pd.DataFrame({'timestamp': ['5/3/16 8:00', '5/3/16 9:00', '5/3/16 10:00', '5/3/16 11:00', 
                                  '5/3/16 12:00', '5/3/16 13:00', '5/3/16 14:00', '5/3/16 15:00', 
                                  '5/3/16 16:00', '5/3/16 19:00', '5/3/16 20:00', '5/3/16 23:00'], 
                    'value_A': [64, 74, 54, 34, 26, 42, 44, 14, 65, 36, 32, 32]})

df2 = pd.DataFrame({'Month': [5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
                    'Day': [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
                    'Hour': [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
                    'value_B': [35, 44, 22, 25, 75, 64, 43, 44, 26, 22, 35, 36, 32, 26, 44, 22]})

Now, while these two dataframes look similar, they are still quite different, mainly because dataframe A has the timestamp in one column, and dataframe B has the time and date across three columns. Also, notice that there are skipped timestamp entries in dataframe A, and so dataframe B has more rows. What I want to do is match up the rows by timestamp and create a new dataframe with columns for both value_A and value_B, forgetting the value_B rows in dataframe_B that do not have a corresponding timestamp with dataframe A. And so here is the dataframe I want to produce dataframe C:

   timestamp     value_A      value_B
--------------------------------------  
 5/3/16 8:00          64           35    
 5/3/16 9:00          74           44
5/3/16 10:00          54           22
5/3/16 11:00          34           25
5/3/16 12:00          26           75
5/3/16 13:00          42           64
5/3/16 14:00          44           43
5/3/16 15:00          14           44
5/3/16 16:00          65           26
5/3/16 19:00          36           36
5/3/16 20:00          32           32
5/3/16 23:00          32           22

I am confused about how to work out the logic so that a timestamp can be pieced together from my Month, Day, and Hour columns, and also how to indicate that I want rows without a matching timestamp to be omitted. How can I match up these two dataframes into dataframe C so that the timestamped rows are matched up?

CodePudding user response:

You can convert the values of "Month", "Day", "Hour" columns to list of lists and then using list comprehension, format each sublist to the correct datetime format and assign the final list of strings to a new timestamp column. Then convert both timestamp columns to datetimes (actually this step is not needed for this exercise but I'm assuming you'll need it later).

Finally merge df1 to df2 on timestamp from the left side:

df2['timestamp'] = ['{}/{}/16 {}:00'.format(*lst) for lst in df2.loc[:, 'Month':'Hour'].to_numpy().tolist()]
df2['timestamp'] = pd.to_datetime(df2['timestamp'])
df1['timestamp'] = pd.to_datetime(df1['timestamp'])
df3 = df1.merge(df2[['timestamp','value_B']], on='timestamp', how='left')

Output:

             timestamp  value_A  value_B
0  2016-05-03 08:00:00       64       35
1  2016-05-03 09:00:00       74       44
2  2016-05-03 10:00:00       54       22
3  2016-05-03 11:00:00       34       25
4  2016-05-03 12:00:00       26       75
5  2016-05-03 13:00:00       42       64
6  2016-05-03 14:00:00       44       43
7  2016-05-03 15:00:00       14       44
8  2016-05-03 16:00:00       65       26
9  2016-05-03 19:00:00       36       36
10 2016-05-03 20:00:00       32       32
11 2016-05-03 23:00:00       32       22
  •  Tags:  
  • Related