I have a dataframe that looks like the following image.
This is what df.head().df.dict() produces:
The Time Occurred field is a time field in the HHMM format but datatype is int. I'd like to create a new field "Hour" from the "Time Occurred" field which only shows the hour. For instance, where '450' I'd have '4' in the hour field. I tried the following code:
s = df['Time Occurred'].astype(str)
df['Hour'] = pd.to_timedelta(s.str[:2] ':' s.str[2:], unit='h')
But I get the following error: ValueError: unit must not be specified if the input contains a str
I found several posts on stack that addressed extracting time from datetime but this isn't the issue I'm trying to solve for
CodePudding user response:
pandas.to_datetime will help. Its option format enables to convert a string with a given format into a datetime. We parse "HHMM" using format='%H%M (see datetime package).
To answer the question:
# a toy dataframe
df = pd.DataFrame({'Time Occurred': {0:'450', 1:'2320', 2:'545'}})
# parse the date
date = pd.to_datetime(df['Time Occurred'], format='%H%M')
# make new columns
df['Hours'] = date.dt.hour
df['Minutes'] = date.dt.minute
print(df)
Time Occurred Hours Minutes
0 450 4 50
1 2320 23 20
2 545 5 45
Note that you can also parse date strings into datetime objects using datetime.strptime.
CodePudding user response:
One way using pandas.Series.str.zfill:
s = pd.Series([1, 450, 2320, 545, 350, 2100])
pd.to_datetime(s.astype(str).str.zfill(4), format="%H%M")
Output:
0 1900-01-01 00:01:00
1 1900-01-01 04:50:00
2 1900-01-01 23:20:00
3 1900-01-01 05:45:00
4 1900-01-01 03:50:00
5 1900-01-01 21:00:00
Explain:
str.zfill(n)pads zeros on the left of its inputs; so your ints become a string with at most 4 digits.0 0001 1 0450 2 2320 3 0545 4 0350 5 2100pd.to_datetimethen uses the string to parse based onformat, i.e. HHMM (%H%M).

