Home > Back-end >  When time zones are different is a pandas dataframe, how to change time zone?
When time zones are different is a pandas dataframe, how to change time zone?

Time:02-03

I got a dataframe like this:

   State   Timezone    UTC_Time               Parking
0  OH      US/Eastern  2019-01-01 12:00:00    12
1  OH      US/Eastern  2019-01-01 13:00:00    11
2  WI      US/Central  2019-01-01 12:00:00    65
3  WI      US/Central  2019-01-01 13:00:00    67

I want to add a column 'Local_Time' by dt.tz_convert('XXX') where that 'XXX' is what in column Timezone.

Previously when there was only Ohio data I counld simply use

df['Local_Time'] = df['UTC_Time'].dt.tz_convert('US/Eastern')

I tried

df['Local_Time'] = df['UTC_Time'].dt.tz_convert(df['Timezone'])

but failed with ValueError: The truth value of a Series is ambiguous. I suppose there are other numpy manipulations behind.

CodePudding user response:

You can't do this (almost) because pd.DatetimeIndex supports only one timezone but you can have multiple instance of pd.Timestamp inside a Series. What you lost here is the dt accessor because your dtype column will be object instead of datetime64.

df['Local_Time'] = df.apply(lambda x: x['UTC_Time'].tz_localize(x['Timezone']), axis=1)
print(df)

# Output
  State    Timezone             UTC_Time  Parking                 Local_Time
0    OH  US/Eastern  2019-01-01 12:00:00       12  2019-01-01 12:00:00-05:00
1    OH  US/Eastern  2019-01-01 13:00:00       11  2019-01-01 13:00:00-05:00
2    WI  US/Central  2019-01-01 12:00:00       65  2019-01-01 12:00:00-06:00
3    WI  US/Central  2019-01-01 13:00:00       67  2019-01-01 13:00:00-06:00

Check dtype:

>>> df.dtypes
State                 object
Timezone              object
UTC_Time      datetime64[ns]
Parking                int64
Local_Time            object
dtype: object

You have to know, it's totally sub optimal to have multiple timezone into a same Series, mainly because you lose the dt accessor so you have to access value individually and not in a vectorized way.

CodePudding user response:

.dt.tz_convert only works with one timezone passed, so you'll have to do this row-by-row. However, a more efficient way than that would be to group by Timezone and call tz_convert for each group on all the UTC_Time values for that group:

df['Local_Time'] = df.groupby('Timezone', as_index=False).apply(lambda g: g['UTC_Time'].dt.tz_convert(g['Timezone'].iloc[0])).reset_index(drop=True)

Output:

>>> df
  State    Timezone                  UTC_Time  Parking                 Local_Time
0    OH  US/Eastern 2019-01-01 12:00:00 00:00       12  2019-01-01 06:00:00-06:00
1    OH  US/Eastern 2019-01-01 13:00:00 00:00       11  2019-01-01 07:00:00-06:00
2    WI  US/Central 2019-01-01 12:00:00 00:00       65  2019-01-01 07:00:00-05:00
3    WI  US/Central 2019-01-01 13:00:00 00:00       67  2019-01-01 08:00:00-05:00
  •  Tags:  
  • Related