I have a couple of pandas functions that I use to collect relative start dates for previous time periods. I noticed today on the start of the new month, my business month start (BMS) function returned an unexpected timestamp:
# so.py
import pandas
import time
def now(format='ms', normalize=True):
obj = pandas.Timestamp.now(tz='America/Toronto').normalize()
if normalize == False:
obj = pandas.Timestamp.now(tz='America/Toronto')
if format == 'ms':
obj = int(time.mktime(obj.timetuple()) * 1000)
return(obj)
def BMS(multiplier, format='ms'):
obj = now(format=None) pandas.tseries.offsets.BusinessMonthBegin(multiplier)
obj = pandas.Timestamp(obj).floor(freq='D')
if format == 'ms':
obj = int(time.mktime(obj.timetuple()) * 1000)
return(obj)
print(f'my function: {BMS(-4, format=None)}')
# python3 so.py
2021-10-01 00:00:00-04:00
#
2021-10-01 00:00:00-04:00 is unexpected, because this timestamp was the same timestamp that was returned yesterday:
yesterday = pandas.Timestamp.now(tz='America/Toronto').normalize() - pandas.Timedelta(days=1)
print(f'yesterday: {yesterday pandas.tseries.offsets.BusinessMonthBegin(-4)}')
# yesterday: 2021-10-01 00:00:00-04:00
Since today is a new month, I would expect BMS(-4, format=None) to return
2021-11-01 00:00:00-04:00
In case it might be necessary, a more basic mre to re-produce what my functions are doing is like so:
# MRE
today = pandas.Timestamp.now(tz='America/Toronto').normalize()
print(f'mre: {today pandas.tseries.offsets.BusinessMonthBegin(-4)}')
Update
This morning, the mre returned the expected timestamp
2021-11-01 00:00:00-04:00
Since it rolled over on the second day of the month and not the first day of the month, maybe there's an implicit inclusion of the first day of the month when calculating BusinessMonthBegin?
What am I missing?
CodePudding user response:
If the date falls on the offset, the offset addition already gives the previous bmonth start date (e.g. 2022-02-01 is a business month start date):
import pandas as pd
t_on_offset = pd.Timestamp('2022-02-01')
t_after_offset = pd.Timestamp('2022-02-02')
## on the offset, the offset addition will go back one month already:
t_on_offset pd.tseries.offsets.BusinessMonthBegin(-1)
# Timestamp('2022-01-03 00:00:00')
# it seems what you actually want here is
# t_on_offset pd.tseries.offsets.BusinessMonthBegin(0)
# this just rolls back to the beginning of the BM:
t_after_offset pd.tseries.offsets.BusinessMonthBegin(-1)
# Timestamp('2022-02-01 00:00:00')
You can check if you're on the offset like
pd.tseries.offsets.BusinessMonthBegin().rollback(t_on_offset) == t_on_offset
# True
pd.tseries.offsets.BusinessMonthBegin().rollback(t_after_offset) == t_after_offset
# False
So in your example BMS function (slightly refactored), that could look like
def BMS(timestamp, multiplier, normalize=True, format='ms'):
if pd.tseries.offsets.BusinessMonthBegin().rollback(timestamp) == timestamp:
if multiplier < 0:
multiplier = 1
obj = timestamp pd.tseries.offsets.BusinessMonthBegin(multiplier)
if normalize:
obj = obj.normalize()
if format == 'ms':
return obj.timestamp() * 1000
return(obj)
In action:
for t in pd.Timestamp('2022-01-31'), pd.Timestamp('2022-02-01'), pd.Timestamp('2022-02-02'):
print(f'{str(t)} -> my function: {BMS(-4, t, format=None)}')
2022-01-31 00:00:00 -> my function: 2021-10-01 00:00:00
2022-02-01 00:00:00 -> my function: 2021-11-01 00:00:00
2022-02-02 00:00:00 -> my function: 2021-11-01 00:00:00
