Home > Enterprise >  Keep the first value using pandas diff()
Keep the first value using pandas diff()

Time:01-28

I got a dataframe as shown below:

Note: Datetime is the index
           Name   target_mtd
Datetime 
2021-12-01 Amy     1000
2021-12-02 Amy     2500
2021-12-03 Amy     4000
2021-12-01 Bobo    2000
2021-12-02 Bobo    3000
2021-12-03 Bobo    4000

And I would like to transform the column target_mtd into daily values in each group, therefore I perform the following code:

df['target_daily'] = df.groupby([df.index.month, 'Name'])['target_mtd'].transform(lambda x:x.diff())

And gives the result which is not the same as I expected:

           Name   target_mtd  target_daily
Datetime 
2021-12-01 Amy     1000         NaN
2021-12-02 Amy     2500         1500
2021-12-03 Amy     4000         1500
2021-12-01 Bobo    2000         NaN
2021-12-02 Bobo    3000         1000
2021-12-03 Bobo    4000         1000

Expected result is that the first value will be kept:

           Name   target_mtd  target_daily
Datetime 
2021-12-01 Amy     1000         1000
2021-12-02 Amy     2500         1500
2021-12-03 Amy     4000         1500
2021-12-01 Bobo    2000         2000
2021-12-02 Bobo    3000         1000
2021-12-03 Bobo    4000         1000

Thanks!

CodePudding user response:

You can replace missing values by original column by Series.fillna:

df['target_daily'] = (df.groupby([df.index.month, 'Name'])['target_mtd']
                        .diff()
                        .fillna(df['target_mtd']))

If there is multiple years is necessary use month periods for distinguish years with months separately:

df['target_daily'] = (df.groupby([df.index.to_period('m'), 'Name'])['target_mtd']
                        .diff()
                        .fillna(df['target_mtd']))

Or use Grouper per months (also years months are count separately):

df['target_daily'] = (df.groupby([pd.Grouper(freq='m'), 'Name'])['target_mtd']
                        .diff()
                        .fillna(df['target_mtd']))
  •  Tags:  
  • Related