I have a table as below and want to fill down the Stage of the same category based on the condition
if Stage = "Delivered" then fill down "Delivered" to all the next rows else if Stage = "Paid" then fill down "Paid" to all the next rows
| Category | Date | Stage |
|---|---|---|
| A | 2021-11-01 | Ordered |
| A | 2021-12-01 | Paid |
| A | 2022-01-01 | |
| B | 2021-08-01 | |
| B | 2021-09-01 | Ordered |
| B | 2021-10-01 | Paid |
| B | 2021-11-01 | Ordered |
| B | 2021-12-01 | Delivered |
The result should look like:
| Category | Date | Stage |
|---|---|---|
| A | 2021-11-01 | Ordered |
| A | 2021-12-01 | Paid |
| A | 2022-01-01 | Paid |
| B | 2021-08-01 | |
| B | 2021-09-01 | Ordered |
| B | 2021-10-01 | Paid |
| B | 2021-11-01 | Paid |
| B | 2021-12-01 | Delivered |
Could anyone help? I would really appreciate it!
CodePudding user response:
You can use mask and combine_first:
Assuming your dataframe is already sorted by Date column.
df['Stage'] = df['Stage'].mask(~df['Stage'].isin(['Paid', 'Delivered'])) \
.groupby(df['Category']).ffill() \
.combine_first(df['Stage'])
print(df)
# Output
Category Date Stage
0 A 2021-11-01 Ordered
1 A 2021-12-01 Paid
2 A 2022-01-01 Paid
3 B 2021-08-01
4 B 2021-09-01 Ordered
5 B 2021-10-01 Paid
6 B 2021-11-01 Paid
7 B 2021-12-01 Delivered
