I have the following dataframe:
Time response_a response_b
2022-01-22 16:00:00.222 101.01 0.5
2022-01-22 16:00:00.347 101.7 0.6
...
2022-01-22 16:00:01.100 102 0.7
2022-01-22 16:00:01.255 103 0.8
and I would like to get the following:
Time response_a response_b response_a_lagged response_b_lagged
2022-01-22 16:00:00.222 101.01 0.5 103 0.8
2022-01-22 16:00:00.347 101.7 0.6 etc etc
...
2022-01-22 16:00:01.100 102 0.7 etc etc
2022-01-22 16:00:01.255 103 0.8 etc etc
Time is a DateTimeIndex. I would like to get two new column with response_a lagged by x seconds (for example 1 second) and same thing for response_b. In the example above, if there's no value exactly 1 second later, it should take the next one directly available.
I have tried df.shift(periods=1,freq='s') and df.shift(periods=1000,freq='ms')
but I get the following error: "cannot reindex from a duplicate axis"
I did a SELECT DISTINCT to get the data from the DB so I think I shouldn't have any duplicated index?
Thank you !
CodePudding user response:
col = ['Time','response_a','response_b' ]
data = [
['2022-01-22 16:00:00.222',101.1,0.5],
['2022-01-22 16:00:00.347',101.7,0.6],
['2022-01-22 16:00:01.100',102,0.7],
['2022-01-22 16:00:01.255',103,0.8],
]
df = pd.DataFrame(data, columns=col)
df['Time'] = pd.to_datetime(df['Time'])
df_temp = df.copy()
df_temp['Time'] -= pd.Timedelta(seconds=1)
pd.merge_asof(df, df_temp, on='Time', suffixes = ('', '_lagged'), direction='forward')
Your criteria is to match the first record at least 1 second later, you want pd.merge_asof for this kind of work where the matching is not exact.
However, pd.merge_asof does not take argument like 1 second so a little trick here is to create a dummy df_temp. Today you only need a minimum of 1 second, but in the future if you need also a maximum, please read about the tolerance argument for pd.merge_asof
reference: https://pandas.pydata.org/docs/reference/api/pandas.merge_asof.html
