Below is my current df where it has a 1 if the id goes through a given stage and this is constructed by having a 1 if it is currently in a given stage or if it has gone through that stage in the past.
| id | stage 1 | stage 2 | stage 3 | stage 4 | stage 5 | stage 6 | close lost | close won |
|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
| 2 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| 2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
This next table is what I am looking to generate. I want to only know if a given id has skipped a stage or not and I want it to be sequential like it is in the previous table.
| id | stage 1 skip | stage 2 skip | stage 3 skip | stage 4 skip | stage 5 skip | stage 6 skip | is_closed |
|---|---|---|---|---|---|---|---|
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 2 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| 2 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
CodePudding user response:
Solution
- Set the index to
idthen change thedtypeof remaining columns to boolean - Invert the columns and calculate the cumulative maximum along the columns axis, this would serve as the indicator which shows the last available stage that an
idhas gone thorugh - Negate the logical mask from step 1 so that it will show us stages that are skipped by given
id, then take the logical and with the mask from step 2. The resulting mask will give us the indicator of which stages are skipped by the givenid
m = df.set_index('id').astype(bool)
s = (m.loc[:, ::-1].cummax(1).loc[:, ::-1] & ~m)
s = s.drop(['close lost', 'close won'], axis=1).astype(int).add_suffix(' skip')
s['is_closed'] = (m['close lost'] | m['close won']).astype(int)
print(s)
stage 1 skip stage 2 skip stage 3 skip stage 4 skip stage 5 skip stage 6 skip is_closed
id
1 0 0 0 0 0 0 0
1 0 1 1 0 0 0 0
1 0 1 1 0 0 0 0
1 0 1 1 0 0 1 1
2 1 0 0 0 0 0 0
2 1 0 1 0 0 0 0
2 1 0 1 0 1 0 0
2 1 0 1 0 1 0 1
