Home > Net >  pandas timeseries offset BusinessMonthBegin doesn't roll over on the new month
pandas timeseries offset BusinessMonthBegin doesn't roll over on the new month

Time:02-05

I have a couple of pandas functions that I use to collect relative start dates for previous time periods. I noticed today on the start of the new month, my business month start (BMS) function returned an unexpected timestamp:

# so.py
import pandas
import time

def now(format='ms', normalize=True):
    obj = pandas.Timestamp.now(tz='America/Toronto').normalize()
    if normalize == False:
        obj = pandas.Timestamp.now(tz='America/Toronto')
    if format == 'ms':
        obj = int(time.mktime(obj.timetuple()) * 1000)
    return(obj)

def BMS(multiplier, format='ms'):
    obj = now(format=None)   pandas.tseries.offsets.BusinessMonthBegin(multiplier)
    obj = pandas.Timestamp(obj).floor(freq='D')
    if format == 'ms':
        obj = int(time.mktime(obj.timetuple()) * 1000)
    return(obj)

print(f'my function: {BMS(-4, format=None)}')

# python3 so.py
2021-10-01 00:00:00-04:00
#

2021-10-01 00:00:00-04:00 is unexpected, because this timestamp was the same timestamp that was returned yesterday:


yesterday = pandas.Timestamp.now(tz='America/Toronto').normalize() - pandas.Timedelta(days=1)
print(f'yesterday: {yesterday   pandas.tseries.offsets.BusinessMonthBegin(-4)}')

# yesterday: 2021-10-01 00:00:00-04:00

Since today is a new month, I would expect BMS(-4, format=None) to return 2021-11-01 00:00:00-04:00

In case it might be necessary, a more basic mre to re-produce what my functions are doing is like so:

# MRE
today = pandas.Timestamp.now(tz='America/Toronto').normalize()
print(f'mre: {today   pandas.tseries.offsets.BusinessMonthBegin(-4)}')

Update This morning, the mre returned the expected timestamp

2021-11-01 00:00:00-04:00

Since it rolled over on the second day of the month and not the first day of the month, maybe there's an implicit inclusion of the first day of the month when calculating BusinessMonthBegin?

What am I missing?

CodePudding user response:

If the date falls on the offset, the offset addition already gives the previous bmonth start date (e.g. 2022-02-01 is a business month start date):

import pandas as pd

t_on_offset = pd.Timestamp('2022-02-01')
t_after_offset = pd.Timestamp('2022-02-02')

## on the offset, the offset addition will go back one month already:
t_on_offset   pd.tseries.offsets.BusinessMonthBegin(-1)
# Timestamp('2022-01-03 00:00:00')

# it seems what you actually want here is
# t_on_offset   pd.tseries.offsets.BusinessMonthBegin(0)

# this just rolls back to the beginning of the BM:
t_after_offset   pd.tseries.offsets.BusinessMonthBegin(-1)
# Timestamp('2022-02-01 00:00:00')

You can check if you're on the offset like

pd.tseries.offsets.BusinessMonthBegin().rollback(t_on_offset) == t_on_offset
# True

pd.tseries.offsets.BusinessMonthBegin().rollback(t_after_offset) == t_after_offset
# False

So in your example BMS function (slightly refactored), that could look like

def BMS(timestamp, multiplier, normalize=True, format='ms'):
    if pd.tseries.offsets.BusinessMonthBegin().rollback(timestamp) == timestamp:
        if multiplier < 0:
            multiplier  = 1
    obj = timestamp   pd.tseries.offsets.BusinessMonthBegin(multiplier)
    
    if normalize:
        obj = obj.normalize()

    if format == 'ms':
        return obj.timestamp() * 1000

    return(obj)

In action:

for t in pd.Timestamp('2022-01-31'), pd.Timestamp('2022-02-01'), pd.Timestamp('2022-02-02'):
    print(f'{str(t)} -> my function: {BMS(-4, t, format=None)}')
    
2022-01-31 00:00:00 -> my function: 2021-10-01 00:00:00
2022-02-01 00:00:00 -> my function: 2021-11-01 00:00:00
2022-02-02 00:00:00 -> my function: 2021-11-01 00:00:00
  •  Tags:  
  • Related