Home > Back-end >  Get the element immediately available after x seconds in a datetime index dataframe
Get the element immediately available after x seconds in a datetime index dataframe

Time:01-24

I have the following dataframe:

Time                     response_a  response_b 
2022-01-22 16:00:00.222    101.01     0.5          
2022-01-22 16:00:00.347    101.7      0.6          
...
2022-01-22 16:00:01.100    102        0.7          
2022-01-22 16:00:01.255    103        0.8          

and I would like to get the following:

Time                     response_a  response_b  response_a_lagged     response_b_lagged
2022-01-22 16:00:00.222    101.01     0.5          103                    0.8
2022-01-22 16:00:00.347    101.7      0.6          etc                    etc
...
2022-01-22 16:00:01.100    102        0.7          etc                    etc
2022-01-22 16:00:01.255    103        0.8          etc                    etc

Time is a DateTimeIndex. I would like to get two new column with response_a lagged by x seconds (for example 1 second) and same thing for response_b. In the example above, if there's no value exactly 1 second later, it should take the next one directly available.

I have tried df.shift(periods=1,freq='s') and df.shift(periods=1000,freq='ms')
but I get the following error: "cannot reindex from a duplicate axis"

I did a SELECT DISTINCT to get the data from the DB so I think I shouldn't have any duplicated index?

Thank you !

CodePudding user response:

col = ['Time','response_a','response_b' ]

data = [
['2022-01-22 16:00:00.222',101.1,0.5],
['2022-01-22 16:00:00.347',101.7,0.6],
['2022-01-22 16:00:01.100',102,0.7],
['2022-01-22 16:00:01.255',103,0.8],
]

df = pd.DataFrame(data, columns=col)
df['Time'] = pd.to_datetime(df['Time'])

df_temp = df.copy()
df_temp['Time'] -= pd.Timedelta(seconds=1)

pd.merge_asof(df, df_temp, on='Time', suffixes = ('', '_lagged'), direction='forward')

Your criteria is to match the first record at least 1 second later, you want pd.merge_asof for this kind of work where the matching is not exact.

However, pd.merge_asof does not take argument like 1 second so a little trick here is to create a dummy df_temp. Today you only need a minimum of 1 second, but in the future if you need also a maximum, please read about the tolerance argument for pd.merge_asof

reference: https://pandas.pydata.org/docs/reference/api/pandas.merge_asof.html

  •  Tags:  
  • Related