Home > Software engineering >  Last week of the year is given attributed to the next year
Last week of the year is given attributed to the next year

Time:01-05

The last week of 2021 is assigned as 2022 in pandas.date_range() function, whereas, in exception of Saturday and Sunday (1st and 2nd of January), the rest of the days fall under 2021.

import pandas as pd

for x in pd.date_range(start='2021-12-01', end='2022-01-04', freq='W'):
    print('date: ', x, '\tweek: ', x.week, '\tyear: ', x.year)
Output:

date:  2021-12-05 00:00:00      week:  48       year:  2021
date:  2021-12-12 00:00:00      week:  49       year:  2021
date:  2021-12-19 00:00:00      week:  50       year:  2021
date:  2021-12-26 00:00:00      week:  51       year:  2021
date:  2022-01-02 00:00:00      week:  52       year:  2022

The output makes sense, however, this doesn't work under the filtration that I am using:

df[(df['date'].year == x.year) & (df['date'].week == x.week)]

Currently this issue is fixed with a band-aid, but hoping to have it fully functioning for a next year.

CodePudding user response:

It's a feature, not a bug. Week numbering is based on ISO 8601, specifically: "If 1 January is on a Friday, Saturday or Sunday, it is in week 52 or 53 of the previous year". You need to change your application logic to include that edge case.

https://en.wikipedia.org/wiki/ISO_8601#Week_dates

Also, according to pandas documentation:

"weekofyear and week have been deprecated. Please use DatetimeIndex.isocalendar().week instead."

If you switch to both x.isocalendar().week and x.isocalendar().year you will get consistent, although not intuitive outputs:

date:  2021-12-19 00:00:00      week:  50       year:  2021
date:  2021-12-26 00:00:00      week:  51       year:  2021
date:  2022-01-02 00:00:00      week:  52       year:  2021
date:  2022-01-09 00:00:00      week:  1        year:  2022

CodePudding user response:

What you could do - you could apply your filtration on the beginning of the week day, using this property:

pd.Timestamp(2022, 1, 2).to_period('W').start_time

OUTPUT

Timestamp('2021-12-27 00:00:00')

So:

import pandas as pd

d = pd.DataFrame({"date":[pd.Timestamp(2022,1,2)]})

d[d["date"].dt.to_period('W').apply(lambda x: x.start_time.isocalendar()[:2] == (2021, 52))]

OUTPUT

        date
0 2022-01-02
  •  Tags:  
  • Related