The last week of 2021 is assigned as 2022 in pandas.date_range() function, whereas, in exception of Saturday and Sunday (1st and 2nd of January), the rest of the days fall under 2021.
import pandas as pd
for x in pd.date_range(start='2021-12-01', end='2022-01-04', freq='W'):
print('date: ', x, '\tweek: ', x.week, '\tyear: ', x.year)
Output:
date: 2021-12-05 00:00:00 week: 48 year: 2021
date: 2021-12-12 00:00:00 week: 49 year: 2021
date: 2021-12-19 00:00:00 week: 50 year: 2021
date: 2021-12-26 00:00:00 week: 51 year: 2021
date: 2022-01-02 00:00:00 week: 52 year: 2022
The output makes sense, however, this doesn't work under the filtration that I am using:
df[(df['date'].year == x.year) & (df['date'].week == x.week)]
Currently this issue is fixed with a band-aid, but hoping to have it fully functioning for a next year.
CodePudding user response:
It's a feature, not a bug. Week numbering is based on ISO 8601, specifically: "If 1 January is on a Friday, Saturday or Sunday, it is in week 52 or 53 of the previous year". You need to change your application logic to include that edge case.
https://en.wikipedia.org/wiki/ISO_8601#Week_dates
Also, according to pandas documentation:
"weekofyear and week have been deprecated. Please use DatetimeIndex.isocalendar().week instead."
If you switch to both x.isocalendar().week and x.isocalendar().year you will get consistent, although not intuitive outputs:
date: 2021-12-19 00:00:00 week: 50 year: 2021
date: 2021-12-26 00:00:00 week: 51 year: 2021
date: 2022-01-02 00:00:00 week: 52 year: 2021
date: 2022-01-09 00:00:00 week: 1 year: 2022
CodePudding user response:
What you could do - you could apply your filtration on the beginning of the week day, using this property:
pd.Timestamp(2022, 1, 2).to_period('W').start_time
OUTPUT
Timestamp('2021-12-27 00:00:00')
So:
import pandas as pd
d = pd.DataFrame({"date":[pd.Timestamp(2022,1,2)]})
d[d["date"].dt.to_period('W').apply(lambda x: x.start_time.isocalendar()[:2] == (2021, 52))]
OUTPUT
date
0 2022-01-02
