I have a pandas dataframe with multiple columns, which looks like the following:
| Index | ID | Year | Code | #Purchase | Mode |
|---|---|---|---|---|---|
| 0 | 100 | 2018 | ABC | 1 | 1 |
| 1 | 100 | 2019 | DEF | 2 | 2 |
| 2 | 100 | 2019 | GHI | 3 | 3 |
| 3 | 102 | 2018 | JKL | 4 | 2 |
| 4 | 103 | 2019 | MNO | 5 | 1 |
| 5 | 103 | 2020 | PQR | 6 | 2 |
| 6 | 102 | 2019 | PQR | 3 | 3 |
| 7 | 104 | 2019 | LMN | 3 | 1 |
| 7 | 104 | 2021 | LMN | 1 | 3 |
I want to group rows w.r.t to ID and then pivot the results and would only like to keep IDs that have an entry against Mode_1 then Mode_2 and then Mode_3. The result should look like this:
| Index | ID | Year_1 | Code_1 | #Purchase_1 | Mode_1 | Year_2 | Code_2 | #Purchase_2 | Mode_2 | Year_3 | Code_3 | #Purchase_3 | Mode_3 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 100 | 2018 | ABC | 1 | 1 | 2019 | DEF | 2 | 2 | 2019 | GHI | 3 | 3 |
| 1 | 103 | 2019 | MNO | 5 | 1 | 2020 | PQR | 6 | 2 | NaN | NaN | NaN | NaN |
In the example results, we can see that ID 102 is dropped because it doesn't have a Mode = 1, and ID 104 is dropped because the Mode value skips 2. So the possible combinations of Mode against an ID needed are 1, 1-->2, or 1-->2-->3.
It would be really appreciated if someone could help me with this example problem. TIA
CodePudding user response:
Try to the melt the dataframe with required column then do the pivot the results
CodePudding user response:
Use:
#filter if difference is 1 per groups
m1 = df.groupby('ID')['Mode'].transform(lambda x: x.diff().iloc[1:].eq(1).all())
#filter if first value per group is 1
m2 = df.groupby('ID')['Mode'].transform('first').eq(1)
#pivoting all columns by ID per groups g created by copy Mode column
df = df[m1 & m2].assign(g = lambda x: x['Mode']).pivot(index='ID', columns='g').sort_index(axis=1, level=1)
#flatten MultiIndex
df.columns = df.columns.map(lambda x: f'{x[0]}_{x[1]}')
df = df.reset_index()
print (df)
ID #Purchase_1 Code_1 Mode_1 Year_1 #Purchase_2 Code_2 Mode_2 \
0 100 1.0 ABC 1.0 2018.0 2.0 DEF 2.0
1 103 5.0 MNO 1.0 2019.0 6.0 PQR 2.0
Year_2 #Purchase_3 Code_3 Mode_3 Year_3
0 2019.0 3.0 GHI 3.0 2019.0
1 2020.0 NaN NaN NaN NaN
