So I am really new to this and struggling with something, which I feel should be quite simple.
I have a Pandas Dataframe containing two columns: Fiscal Week (str) and Amount sold (int).
| Fiscal Week | Amount sold | |
|---|---|---|
| 0 | 2019031 | 24 |
| 1 | 2019041 | 47 |
| 2 | 2019221 | 34 |
| 3 | 2019231 | 46 |
| 4 | 2019241 | 35 |
My problem is the fiscal week column. It contains strings which describe the fiscal year and week . The fiscal year for this purpose starts on October 1st and ends on September 30th. So basically, 2019031 is the Monday (the 1 at the end) of the third week of October 2019. And 2019221 would be the 2nd week of March 2020. The issue is that I want to turn this data into timeseries later. But I can't do that with the data in string format - I need it to be in date time format.
I actually added the 1s at the end of all these strings using
df['Fiscal Week']= df['Fiscal Week'].map('{}1'.format)
so that I can then turn it into a proper date:
df['Fiscal Week'] = pd.to_datetime(df['Fiscal Week'], format="%Y%W%w")
as I couldn't figure out how to do it with just the weeks and no day defined. This, of course, returns the following:
| Fiscal Week | Amount sold | |
|---|---|---|
| 0 | 2019-01-21 | 24 |
| 1 | 2019-01-28 | 47 |
| 2 | 2019-06-03 | 34 |
| 3 | 2019-06-10 | 46 |
| 4 | 2019-06-17 | 35 |
As expected, this is clearly not what I need, as according to the definition of the fiscal year week 1 is not January at all but rather October.
Is there some simple solution to get the dates to what they are actually supposed to be?
Ideally I would like the final format to be e.g. 2019-03 for the first entry. So basically exactly like the string but in some kind of date format, that I can then work with later on. Alternatively, calendar weeks would also be fine.
CodePudding user response:
Assuming you have a data frame with fiscal dates of the form 'YYYYWW' where YYY = the calendar year of the start of the fiscal year and ww = the number of weeks into the year, you can convert to calendar dates as follows:
def getCalendarDate(fy_date: str):
f_year = fy_date[0:4]
f_week = fy_date[4:]
fys = pd.to_datetime(f'{f_year}/10/01', format= '%Y/%m/%d')
return fys pd.to_timedelta(int(f_week), "W")
You can then use this function to create the column of calendar dates as follows:
df['Calendar Date]'] = list(getCalendarDate(x) for x in df['Fiscal Week'].to_list())
