I have such a dataframe:
DATE | VALUE
23/07/21 01 | 20
23/07/21 02 | 10
23/07/21 03 | 15
23/07/21 04 | 200
What I want to get is new dataframe which sums up the entries of two consecutive rows of column VALUE and keeps the entry of the second row for column DATE.
It should look like this one:
DATE | VALUE
23/07/21 02 | 30
23/07/21 04 | 215
How can I do this?
CodePudding user response:
Try:
df.groupby(np.arange(len(df))//2).agg({'DATE':'last', 'VALUE':'sum'})
CodePudding user response:
You can try with rolling:
>>> df.set_index("DATE").rolling(2).sum().iloc[1::2].reset_index()
DATE VALUE
0 2021-07-21 02 30.0
1 2021-07-21 04 115.0
This becomes considerably faster with a larger DataFrame:
df = pd.concat([df]*1000)
>>> %timeit df.set_index("DATE").rolling(2).sum().iloc[1::2].reset_index()
1.33 ms ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df.groupby(np.arange(len(df))//2).agg({'DATE':'last', 'VALUE':'sum'})
2.15 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
