I'm trying to generate a frequency DataFrame in Jupyter/Python of HH:MM:SS elements of a datetime column.
Asside from itterating through all HH:MM:SS combinations and counting them (I need to include 0 values), is there a function in Python that can do it for me?
.value_counts() creates what I need, however, 0 values are not included.
Many thanks, in advance, for your assistance :)
EDIT:
Example Data:
| TransactionID | DateTime | Date | Time |
|---|---|---|---|
| 012sad9j20j | 01/01/22 04:23:32 | 01/01/22 | 04:23:32 |
| 938hfd82dj2 | 07/04/22 23:12:59 | 07/04/22 | 23:12:59 |
| s9j20jd902j | 18/05/22 13:44:19 | 18/05/22 | 13:44:19 |
Expected to generate a dataframe containing:
| Time | Count |
|---|---|
| 04:23:31 | 0 |
| 04:23:32 | 1 |
| 04:23:33 | 0 |
CodePudding user response:
Here's a solution:
from pandas import DataFrame
from datetime import datetime, time
df = DataFrame([
{'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=16, hour=13, minute=1, second=2)},
{'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=16, hour=13, minute=3, second=4)},
{'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=17, hour=13, minute=5, second=6)},
{'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=18, hour=13, minute=1, second=2)},
{'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=19, hour=13, minute=3, second=4)}
])
times = DataFrame(df['DateTime'].dt.time).groupby(['DateTime'])['DateTime'].count()
time_counts = DataFrame((
(t := time(h, m, s), int(times[t]) if t in times else 0)
for h in range(24) for m in range(60) for s in range(60)
), columns=['Time', 'Counts'])
print(time_counts)
print(time_counts[time_counts['Time'] == time(13, 1, 2)])
This basically creates an additional DataFrame times based on the original data (assuming you don't even have a Time and Date column, but if you do, you can of course use those) - times takes all the times from df and then groups and counts them.
However, that is missing the times which don't occur in df, so time_counts is constructed by generated all possible times and either selecting the count from times or 0 if it doesn't exist in times.
Result:
Time Counts
0 00:00:00 0
1 00:00:01 0
2 00:00:02 0
3 00:00:03 0
4 00:00:04 0
... ... ...
86395 23:59:55 0
86396 23:59:56 0
86397 23:59:57 0
86398 23:59:58 0
86399 23:59:59 0
[86400 rows x 2 columns]
Time Counts
46862 13:01:02 2
User @riley's suggestion seems to point at a nicer solution, something like:
# create df as before, then:
df['Time'] = df['DateTime'].dt.time.astype('category')
result = df.value_counts('Time')
But that seems to have the same problem as you originally stated, even though the dtype of the 'Time' column is category. Perhaps someone has additional suggestions to make that work.
