Home > Back-end >  cumsum on subset of pandas df columns
cumsum on subset of pandas df columns

Time:02-04

I have a pandas dataframe as follows:

Date       Week    Value1   Value2   Value3

2022-01-01    1     -10       20       30
2022-01-02    1      -5        25       20
2022-01-03    1     0         15       NaN
2022-01-04    1     5         7        10
2022-01-05    1     7         10       15
2022-01-06    1    10        5       NaN

I am looking to perform a cumulative sum such that the resulting DF is as follows

Date        Week Value1   Value2   Value3
2022-01-03   1    -15       60       50
2022-01-05   1    22       22       25

Essentially Value3 has NaN values. No other column has it. I am looking to total up all values for the 3 Value columns between each NaN encountered in Value3. I am also looking to keep Date and Week of the row where I encountered the NaN value as is (so cumsum is applied only to Value columns) I have tried so far (some variations of the below) but w/o success.

df.groupby(['Date','Week'])['Value1', 'Value2','Value3'].apply(lambda x: x.isna().cumsum().reset_index(drop=True))

But havent got the desired result using this. Any ideas on how this can be achieved? Thanks!

CodePudding user response:

We use a greoupby on a cumulative number of NaNs in Value3:

df.groupby(df['Value3'].shift().isna().cumsum()).agg({'Date':'last', 'Week':'last', 'Value1':'sum', 'Value2':'sum', 'Value3':'sum'}).reset_index(drop = True)

output:


    Date       Week Value1  Value2  Value3
0   2022-01-03  1   -15     60      50.0
1   2022-01-06  1   22      22      25.0
  •  Tags:  
  • Related