Given a dataframe df as follows:
date value 20211003 20211010 20211017
0 2021-9-19 3613.9663 NaN NaN NaN
1 2021-9-26 3613.0673 NaN NaN NaN
2 2021-10-3 3568.1668 NaN NaN NaN
3 2021-10-10 3592.1666 3510.221000 NaN NaN
4 2021-10-17 3572.3662 3465.737012 3534.220800 NaN
5 2021-10-24 3582.6036 3479.107035 3539.856801 3514.420400
6 2021-10-31 3547.3361 3421.161235 3481.911001 3456.474600
7 2021-11-7 3491.5677 3370.140147 3439.284539 3416.621024
8 2021-11-14 3539.1002 3319.289523 3391.930037 3370.079953
9 2021-11-21 3560.3734 3261.343723 3333.984237 3312.134153
10 2021-11-28 3564.0894 3255.328902 3338.967086 3305.054247
11 2021-12-5 3607.4320 3313.274702 3396.912886 3363.000047
12 2021-12-12 3666.3479 3371.220502 3450.172564 3412.234440
13 2021-12-19 3632.3638 NaN 3466.930383 3428.683490
14 2021-12-26 3618.0535 NaN NaN 3370.737690
Let's say the columns after value column (20211003, 20211010 and 20211017) are rolling forecast result of value, instead of 10 values for each column, I'll need to keep 3 values only and get the result as follows:
date value 20211003 20211010 20211017
0 2021-9-19 3613.9663 NaN NaN NaN
1 2021-9-26 3613.0673 NaN NaN NaN
2 2021-10-3 3568.1668 NaN NaN NaN
3 2021-10-10 3592.1666 3510.221000 NaN NaN
4 2021-10-17 3572.3662 3465.737012 3534.220800 NaN
5 2021-10-24 3582.6036 3479.107035 3539.856801 3514.420400
6 2021-10-31 3547.3361 NaN 3481.911001 3456.474600
7 2021-11-7 3491.5677 NaN NaN 3416.621024
How could I achieve that in Pandas? Thanks.
CodePudding user response:
Assuming there is a single stretch per column, use a cumcount of the non-NA values to determine which ones to keep, then slice and reindex each column:
cols = ['20211003', '20211010', '20211017']
N = 3
cc = df[cols].apply(lambda s: s.groupby((m:=s.notna())).cumcount().where(m))
m1 = ~cc.min(1).gt(N)
masks = ~cc.ge(N)
out = df.apply(lambda s: s[masks[s.name] if s.name in masks else m1]
.reset_index(drop=True))
output:
date value 20211003 20211010 20211017
0 2021-9-19 3613.9663 NaN NaN NaN
1 2021-9-26 3613.0673 NaN NaN NaN
2 2021-10-3 3568.1668 NaN NaN NaN
3 2021-10-10 3592.1666 3510.221000 NaN NaN
4 2021-10-17 3572.3662 3465.737012 3534.220800 NaN
5 2021-10-24 3582.6036 3479.107035 3539.856801 3514.420400
6 2021-10-31 3547.3361 NaN 3481.911001 3456.474600
7 2021-11-7 3491.5677 NaN NaN 3416.621024
8 2021-11-14 3539.1002 NaN NaN NaN
