Home > Blockchain >  converting time that is an int in HHMM format and creating an hour field from it using pandas
converting time that is an int in HHMM format and creating an hour field from it using pandas

Time:01-11

I have a dataframe that looks like the following image.

dataframe This is what df.head().df.dict() produces:

df.head().df.dict()

The Time Occurred field is a time field in the HHMM format but datatype is int. I'd like to create a new field "Hour" from the "Time Occurred" field which only shows the hour. For instance, where '450' I'd have '4' in the hour field. I tried the following code:

s = df['Time Occurred'].astype(str)
df['Hour'] = pd.to_timedelta(s.str[:2]   ':'   s.str[2:], unit='h')

But I get the following error: ValueError: unit must not be specified if the input contains a str

I found several posts on stack that addressed extracting time from datetime but this isn't the issue I'm trying to solve for

CodePudding user response:

pandas.to_datetime will help. Its option format enables to convert a string with a given format into a datetime. We parse "HHMM" using format='%H%M (see datetime package).

To answer the question:

# a toy dataframe
df = pd.DataFrame({'Time Occurred': {0:'450', 1:'2320', 2:'545'}})
# parse the date
date = pd.to_datetime(df['Time Occurred'], format='%H%M')
# make new columns
df['Hours'] = date.dt.hour
df['Minutes'] = date.dt.minute

print(df)

  Time Occurred  Hours  Minutes
0           450      4       50
1          2320     23       20
2           545      5       45

Note that you can also parse date strings into datetime objects using datetime.strptime.

CodePudding user response:

One way using pandas.Series.str.zfill:

s = pd.Series([1, 450, 2320, 545, 350, 2100])
pd.to_datetime(s.astype(str).str.zfill(4), format="%H%M")

Output:

0   1900-01-01 00:01:00
1   1900-01-01 04:50:00
2   1900-01-01 23:20:00
3   1900-01-01 05:45:00
4   1900-01-01 03:50:00
5   1900-01-01 21:00:00

Explain:

  • str.zfill(n) pads zeros on the left of its inputs; so your ints become a string with at most 4 digits.

    0    0001
    1    0450
    2    2320
    3    0545
    4    0350
    5    2100
    
  • pd.to_datetime then uses the string to parse based on format, i.e. HHMM (%H%M).

  •  Tags:  
  • Related