I am trying to
- get the average wakeup time from a list of wake up times in a df column and
- compare them to a goal wake up time (earlier or later)
import pandas as pd
#convert to datetime object
df['wakeup_time_date']=pd.to_datetime(df['wakeup_time_date'], infer_datetime_format=True)
#extract time
df['wakeup_time']=df['wakeup_time_date'].dt.time
#below doesn't work as the object is no longer a datetime object
df['wakeup_time'].mean()
And lastly, I would like to compare the average wakeup time to 06:30 am and determine it if it earlier or later.
CodePudding user response:
As MrFuppes suggested, probably the easiest way would be converting to datetime.timedeltas, instead of datetime.times:
from datetime import datetime, timedelta, time
df = pd.DataFrame({"wakeup_time_date": ["2019/03/04 07:08:58", "2019/03/05 08:08:48", "2019/03/06 10:00:12", "2019/03/10 6:35:32"]})
df['wakeup_time_date']=pd.to_datetime(df['wakeup_time_date'])
df['time'] = df['wakeup_time_date'] - df['wakeup_time_date'].dt.normalize()
mean_td = df['time'].mean()
#Timedelta('0 days 07:58:22.500000')
Now, if you don't mind operating on timedeltas, you might compare it straight away:
goal_td = timedelta(hours=6, minutes=30)
is_mean_later_than_goal = mean_td > goal_td
# True
or convert it to datetime.time and do the comparison this way:
mean_time = (datetime.min mean_td).time()
goal_time = time(hour=6, minute=30)
is_mean_later_than_goal = mean_time > goal_time
# True
CodePudding user response:
Here is a toy example using np.where() in which True rows have greater than 390 minutes (6:30 AM) and less than 720 (noon, arbitrability chosen for example). First, a minutes column is created using dt attributes.
It's hard to say without seeing your data, but if you have "a list of wake up times in a df column" I would likely set a DatetimeIndex and then do a groupby aggregated by mean time on it before the np.where() step.
from pandas import Timestamp
import pandas as pd
toy_dict = {'wakeup_time_date': {0: Timestamp('2021-11-11 13:04:44.342843'),
1: Timestamp('2021-11-11 15:43:40.654980'),
2: Timestamp('2021-11-11 17:31:07.409101'),
3: Timestamp('2021-11-11 22:19:14.394725'),
4: Timestamp('2021-11-12 09:03:41.957619'),
5: Timestamp('2021-11-12 09:46:24.849156'),
6: Timestamp('2021-11-12 10:45:07.950917'),
7: Timestamp('2021-11-12 12:27:48.189281'),
8: Timestamp('2021-11-12 12:57:27.112832'),
9: Timestamp('2021-11-12 14:06:17.432817')}}
df = pd.DataFrame(toy_dict)
df['minutes'] = df['wakeup_time_date'].dt.hour * 60 df['wakeup_time_date'].dt.minute df['wakeup_time_date'].dt.second/60
df['past_six_thirty'] = np.where((df['minutes'] > 390) & (df['minutes'] < 720), True, False)
print(df)
wakeup_time_date minutes past_six_thirty
0 2021-11-11 13:04:44.342843 784.733333 False
1 2021-11-11 15:43:40.654980 943.666667 False
2 2021-11-11 17:31:07.409101 1051.116667 False
3 2021-11-11 22:19:14.394725 1339.233333 False
4 2021-11-12 09:03:41.957619 543.683333 True
5 2021-11-12 09:46:24.849156 586.400000 True
6 2021-11-12 10:45:07.950917 645.116667 True
7 2021-11-12 12:27:48.189281 747.800000 False
8 2021-11-12 12:57:27.112832 777.450000 False
9 2021-11-12 14:06:17.432817 846.283333 False
