I got a dataframe like this:
State Timezone UTC_Time Parking
0 OH US/Eastern 2019-01-01 12:00:00 12
1 OH US/Eastern 2019-01-01 13:00:00 11
2 WI US/Central 2019-01-01 12:00:00 65
3 WI US/Central 2019-01-01 13:00:00 67
I want to add a column 'Local_Time' by dt.tz_convert('XXX') where that 'XXX' is what in column Timezone.
Previously when there was only Ohio data I counld simply use
df['Local_Time'] = df['UTC_Time'].dt.tz_convert('US/Eastern')
I tried
df['Local_Time'] = df['UTC_Time'].dt.tz_convert(df['Timezone'])
but failed with ValueError: The truth value of a Series is ambiguous. I suppose there are other numpy manipulations behind.
CodePudding user response:
You can't do this (almost) because pd.DatetimeIndex supports only one timezone but you can have multiple instance of pd.Timestamp inside a Series. What you lost here is the dt accessor because your dtype column will be object instead of datetime64.
df['Local_Time'] = df.apply(lambda x: x['UTC_Time'].tz_localize(x['Timezone']), axis=1)
print(df)
# Output
State Timezone UTC_Time Parking Local_Time
0 OH US/Eastern 2019-01-01 12:00:00 12 2019-01-01 12:00:00-05:00
1 OH US/Eastern 2019-01-01 13:00:00 11 2019-01-01 13:00:00-05:00
2 WI US/Central 2019-01-01 12:00:00 65 2019-01-01 12:00:00-06:00
3 WI US/Central 2019-01-01 13:00:00 67 2019-01-01 13:00:00-06:00
Check dtype:
>>> df.dtypes
State object
Timezone object
UTC_Time datetime64[ns]
Parking int64
Local_Time object
dtype: object
You have to know, it's totally sub optimal to have multiple timezone into a same Series, mainly because you lose the dt accessor so you have to access value individually and not in a vectorized way.
CodePudding user response:
.dt.tz_convert only works with one timezone passed, so you'll have to do this row-by-row. However, a more efficient way than that would be to group by Timezone and call tz_convert for each group on all the UTC_Time values for that group:
df['Local_Time'] = df.groupby('Timezone', as_index=False).apply(lambda g: g['UTC_Time'].dt.tz_convert(g['Timezone'].iloc[0])).reset_index(drop=True)
Output:
>>> df
State Timezone UTC_Time Parking Local_Time
0 OH US/Eastern 2019-01-01 12:00:00 00:00 12 2019-01-01 06:00:00-06:00
1 OH US/Eastern 2019-01-01 13:00:00 00:00 11 2019-01-01 07:00:00-06:00
2 WI US/Central 2019-01-01 12:00:00 00:00 65 2019-01-01 07:00:00-05:00
3 WI US/Central 2019-01-01 13:00:00 00:00 67 2019-01-01 08:00:00-05:00
