I have a dataframe like this:
| Sequence | Message |
|---|---|
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
| 1 | A |
| 2 | C |
| 3 | D |
| 4 | E |
| 5 | F |
| 1 | A |
| 2 | C |
| 3 | D |
And I'm trying to pivot it to have table with maximum Sequence length as columns and corresponding possible Message as values.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|
| A | B | C | D | |||
| A | C | D | E | F | ||
| A | C | D |
I'm trying to use pd.get_dummies() for Sequence , but can't figure out how to keep Message values.
Appreciate any help.
CodePudding user response:
A possible solution:
aux = df.groupby('Sequence').agg(list).T.iloc[0,:].rename_axis(None)
(pd.concat([pd.DataFrame(np.array(x)[:,None]) for x in aux], axis=1)
.set_axis(aux.index, axis=1))
Output:
1 2 3 4 5
0 A B C D F
1 A C D E NaN
2 A C D NaN NaN
CodePudding user response:
Use a pivot with df['Sequence'].diff().le(0).cumsum() as index (new row when the Sequence restarts to an earlier number):
(df.assign(index=df['Sequence'].diff().le(0).cumsum())
.pivot(index='index', columns='Sequence', values='Message')
.rename_axis(index=None, columns=None) # optional
)
Output:
1 2 3 4 5
0 A B C D NaN
1 A C D E F
2 A C D NaN NaN
