I have a Dataframe with one row per minute. I need to access the row corresponding to the current minute
value
2022-01-12 11:27:24 01:00 a
2022-01-12 11:28:41 01:00 b
2022-01-12 11:29:36 01:00 c
2022-01-12 11:30:11 01:00 d
2022-01-12 11:31:03 01:00 e
2022-01-12 11:32:39 01:00 f
I have to match just the minute. I tried finding the current time both with pandas and datetime (code to reproduce it):
import pandas as pd
import numpy as np
import string
import datetime
start_idx=(datetime.datetime.now()).strftime(format="%Y-%m-%d %H:%M")
end_idx=(datetime.datetime.now() datetime.timedelta(minutes= 5)).strftime(format="%Y-%m-%d %H:%M")
index_today = pd.date_range(start=start_idx, end=end_idx, freq='1T',tz='Europe/Rome')
# create random seconds
index_today=[i pd.DateOffset(seconds=np.random.randint(0,59)) for i in index_today]
df = pd.DataFrame(index=index_today, data=list(string.ascii_lowercase[0:len(index_today)]),columns=['value'])
now_pandas = pd.to_datetime("now").round(freq='min').tz_localize('utc').tz_convert('Europe/Rome')
now_datetime = datetime.datetime.now().strftime(format="%Y-%m-%d %H:%M")
out_pandas=df.loc[df.index.floor('Min')==now_pandas, :]
out_datetime=df.loc[now_datetime, :]
print('now pandas is ',now_pandas)
print('now datetime is ',now_datetime)
print('Current value found with Pandas:\n',out_pandas)
print('Current value found with datetime\n',out_datetime)
But sometimes they give different results:
now pandas is 2022-01-12 11:46:00 01:00
now datetime is 2022-01-12 11:45
Current value found with Pandas:
value
2022-01-12 11:46:08 01:00 b
Current value found with datetime
value
2022-01-12 11:45:35 01:00 a
What is the best and most robust way to do it?
Also, I noticed that if the dataframe is not tz-aware, then pd.to_datetime("now") gives the time in utc and I need to localize it, convert it and then turn it back to tz-naive. Any solution for that?
Thank you very much!!
CodePudding user response:
How to select row(s) based on the current minute? Make sure to set the condition correctly (as intended), e.g. by flooring the current time to the minute (clip to minute resolution). Ex:
import pandas as pd
import numpy as np
tz = 'Europe/Rome'
now = pd.Timestamp.now(tz)
print(now)
# 2022-01-12 12:11:38.796675 01:00
idx = pd.date_range(now.floor('d'), now.ceil('d'), freq='min')
df = pd.DataFrame(index=idx, data=np.random.randint(0, 5, size=idx.size), columns=['value'])
out_pandas = df.loc[df.index.floor('min') == now.floor('min'), :]
print(out_pandas)
# value
# 2022-01-12 12:11:00 01:00 1
CodePudding user response:
The reason of the difference seems rather simple.
When using pandas you're rounding to the nearest minute, as it's a rounding it goes to the closest value (ie: next minute if it's after second 30).
When using datetime formatting you are merely dropping the seconds.
You can get the same behaviour by either using floor instead of round in pandas datetime of by rounding datetime.now() to the closest minute before formatting the result.
Or you may just don't care anymore of the difference now that you know where it's coming from and just stick with the pandas method (what I would do).
Note: another minor difference is that you are actually caling now() at two different times in your code once for pandas the other for datetime, hence there already exist a time difference between both. But it's not the main effect.
