hy_code date week_date last_week_date
340 880301 2006/10/31 2006/11/5 NaT
341 880301 2006/11/1 2006/11/5 NaT
342 880301 2006/11/2 2006/11/5 NaT
343 880301 2006/11/3 2006/11/5 NaT
355 880301 2006/11/21 2006/11/26 2006/11/5
482916 880969 2021/12/24 2021/12/26 NaT
482918 880969 2021/12/28 2022/1/2 2021/12/26
482919 880969 2021/12/29 2022/1/2 2021/12/26
482920 880969 2021/12/30 2022/1/2 2021/12/26
482921 880969 2021/12/31 2022/1/2 2021/12/26
Goal
- I want to get last week date for each group week date.The expected result is above.
Try
df['new_week_date']=tmp_df.groupby(['hy_code','week_date'])['week_date'].shift(1), but failed.
hy_code date week_date last_week_date
340 880301 2006-10-31 2006-11-05 NaT
341 880301 2006-11-01 2006-11-05 2006-11-05
342 880301 2006-11-02 2006-11-05 2006-11-05
343 880301 2006-11-03 2006-11-05 2006-11-05
355 880301 2006-11-21 2006-11-26 NaT
482916 880969 2021-12-24 2021-12-26 NaT
482918 880969 2021-12-28 2022-01-02 2021-12-26
482919 880969 2021-12-29 2022-01-02 2022-01-02
482920 880969 2021-12-30 2022-01-02 2022-01-02
482921 880969 2021-12-31 2022-01-02 2022-01-02
CodePudding user response:
You can first drop_duplicates, then create the shift week date and merge back
out = df.drop_duplicates(['hy_code','week_date']).copy()
out['lastweekdate'] = out.groupby('hy_code')['week_date'].shift(1)
df = df.merge(out.drop('date',1),how='left')
df
Out[233]:
hy_code date week_date lastweekdate
0 880301 2006/10/31 2006/11/5 NaN
1 880301 2006/11/1 2006/11/5 NaN
2 880301 2006/11/2 2006/11/5 NaN
3 880301 2006/11/3 2006/11/5 NaN
4 880301 2006/11/21 2006/11/26 2006/11/5
5 880969 2021/12/24 2021/12/26 NaN
6 880969 2021/12/28 2022/1/2 2021/12/26
7 880969 2021/12/29 2022/1/2 2021/12/26
8 880969 2021/12/30 2022/1/2 2021/12/26
9 880969 2021/12/31 2022/1/2 2021/12/26
CodePudding user response:
You can first compute the shifted week as a series with a double groupby:
s = (df
.groupby(['hy_code','week_date'],sort=False)['week_date'].first()
.groupby('hy_code').shift()
.rename('last_week_date')
)
hy_code week_date
880301 2006/11/5 NaN
2006/11/26 2006/11/5
880969 2021/12/26 NaN
2022/1/2 2021/12/26
Name: last_week_date, dtype: object
Then merge it to the original data:
# ensure last_week_date is not pre-existing
# df = df.drop(columns='last_week_date')
# merge
df.merge(s, left_on=['hy_code','week_date'], right_index=True)
Output:
hy_code date week_date last_week_date
340 880301 2006/10/31 2006/11/5 NaN
341 880301 2006/11/1 2006/11/5 NaN
342 880301 2006/11/2 2006/11/5 NaN
343 880301 2006/11/3 2006/11/5 NaN
355 880301 2006/11/21 2006/11/26 2006/11/5
482916 880969 2021/12/24 2021/12/26 NaN
482918 880969 2021/12/28 2022/1/2 2021/12/26
482919 880969 2021/12/29 2022/1/2 2021/12/26
482920 880969 2021/12/30 2022/1/2 2021/12/26
482921 880969 2021/12/31 2022/1/2 2021/12/26
