Home > Software design >  How to insert missing row into pandas dataframe?
How to insert missing row into pandas dataframe?

Time:01-14

Each row in this database represents 1 minute. But some minutes are missing upon pulling the data from API (You'll see 09:51:00 is missing)

  ticker        date    time         vol    vwap    open    high    low    close    lbh lah trades
0   AACG    2022-01-06  09:30:00    33042   1.8807  1.8900  1.9200  1.8700  1.9017  0.0 0.0 68
1   AACG    2022-01-06  09:31:00    5306    1.9073  1.9100  1.9200  1.8801  1.9100  0.0 0.0 27
2   AACG    2022-01-06  09:32:00    3496    1.8964  1.9100  1.9193  1.8800  1.8900  0.0 0.0 17
3   AACG    2022-01-06  09:33:00    5897    1.9377  1.8900  1.9500  1.8900  1.9500  0.0 0.0 15
4   AACG    2022-01-06  09:34:00    1983    1.9362  1.9200  1.9499  1.9200  1.9200  0.0 0.0 9
5   AACG    2022-01-06  09:35:00    10725   1.9439  1.9400  1.9600  1.9201  1.9306  0.0 0.0 87
6   AACG    2022-01-06  09:36:00    5942    1.9380  1.9307  1.9400  1.9300  1.9400  0.0 0.0 48
7   AACG    2022-01-06  09:37:00    5759    1.9428  1.9659  1.9659  1.9400  1.9500  0.0 0.0 11
8   AACG    2022-01-06  09:38:00    4855    1.9424  1.9500  1.9500  1.9401  1.9495  0.0 0.0 10
9   AACG    2022-01-06  09:39:00    6275    1.9514  1.9500  1.9700  1.9450  1.9700  0.0 0.0 14
10  AACG    2022-01-06  09:40:00    13695   2.0150  1.9799  2.0500  1.9749  2.0200  0.0 0.0 59
11  AACG    2022-01-06  09:41:00    3252    2.0209  2.0275  2.0300  2.0200  2.0200  0.0 0.0 14
12  AACG    2022-01-06  09:42:00    12082   2.0117  2.0300  2.0400  1.9800  1.9900  0.0 0.0 41
13  AACG    2022-01-06  09:43:00    5148    1.9802  1.9800  1.9999  1.9750  1.9999  0.0 0.0 11
14  AACG    2022-01-06  09:44:00    2764    1.9927  1.9901  1.9943  1.9901  1.9943  0.0 0.0 5
15  AACG    2022-01-06  09:45:00    2379    1.9576  1.9601  1.9601  1.9201  1.9201  0.0 0.0 10
16  AACG    2022-01-06  09:46:00    8762    1.9852  1.9550  1.9900  1.9550  1.9900  0.0 0.0 35
17  AACG    2022-01-06  09:47:00    1343    1.9704  1.9700  1.9738  1.9700  1.9701  0.0 0.0 5
18  AACG    2022-01-06  09:48:00    17080   1.9696  1.9700  1.9800  1.9600  1.9600  0.0 0.0 9
19  AACG    2022-01-06  09:49:00    9004    1.9600  1.9600  1.9600  1.9600  1.9600  0.0 0.0 9
20  AACG    2022-01-06  09:50:00    9224    1.9603  1.9600  1.9613  1.9600  1.9613  0.0 0.0 4
21  AACG    2022-01-06  09:52:00    16914   1.9921  1.9800  2.0400  1.9750  2.0399  0.0 0.0 67
22  AACG    2022-01-06  09:53:00    4665    1.9866  1.9900  2.0395  1.9801  1.9900  0.0 0.0 37
23  AACG    2022-01-06  09:55:00    2107    2.0049  1.9900  2.0100  1.9900  2.0099  0.0 0.0 10
24  AACG    2022-01-06  09:56:00    3003    2.0028  2.0000  2.0099  2.0000  2.0099  0.0 0.0 23
25  AACG    2022-01-06  09:57:00    8489    2.0272  2.0100  2.0400  2.0100  2.0300  0.0 0.0 34
26  AACG    2022-01-06  09:58:00    6050    2.0155  2.0300  2.0300  2.0150  2.0150  0.0 0.0 6
27  AACG    2022-01-06  09:59:00    61623   2.0449  2.0300  2.0700  2.0300  2.0699  0.0 0.0 83
28  AACG    2022-01-06  10:00:00    19699   2.0856  2.0699  2.1199  2.0600  2.1100  0.0 0.0 54

I want to insert rows with empty values that only include the missing time data as a value.

missing_data = pd.DataFrame({'ticker': ['AACG'], 'date': ['2022-01-06'], 'time': ['09:51:00'], 
                                 'vol': [0], 'vwap': [0.0], 'open': [0.0], 'high': [0.0], 'low': [0.0], 
                                 'close': [0.0], 'lbh': [0.0], 'lah': [0.0], 'trades': [0]}, index=[21])

It would look something like this:

   ticker       date    time       vol  vwap    open high   low close lbh lah trades
21  AACG    2022-01-06  09:51:00    0   0.00  0.00  0.00  0.00  0.00  0.0 0.0 0

With the help of someone, I've managed to isolate the areas that show me where the missing values are at:

time_in_minutes = pd.to_timedelta(df['time'].astype(str)).astype('timedelta64[m]')
indices_where_the_next_minute_is_missing = np.where(np.diff(time_in_minutes) != 1)[0]
out = df.loc[indices_where_the_next_minute_is_missing]

Simply adding 1 to time_in_minutes will give me the correction I need:

timeinminutesplus1 = pd.to_timedelta(out['time'].astype(str)).astype('timedelta64[m]')   1

But how do i turn it back to a datetime.time datatype and insert it into the database?

CodePudding user response:

Building off of my answer to your new_df

start_datetime = '2022-01-06 09:35:00'
end_datetime = '2022-01-06 9:40:00'
cols = ['ticker','date','time','vol','vwap','open','high','low','close','lbh','lah','trades']
other_df = pd.DataFrame(columns=cols,
                      index=pd.date_range(start=start_datetime,
                                          end=end_datetime, freq='min'))

other_df['date'] = [d.date() for d in other_df.index]
other_df['time'] = [d.time() for d in other_df.index]
other_df.fillna(3)

other_df

final_df = new_df.copy()
final_df.update(other_df)

final_df

  •  Tags:  
  • Related