i have a df like this:
| id | month |
|---|---|
| 1 | 1 |
| 1 | 3 |
| 1 | 4 |
| 1 | 6 |
i want to transform it become like this:
| id | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 1 | 1 | 0 | 1 |
ive tried using this code:
ndf = df[['id']].join(pd.get_dummies(
df['month'])).groupby('id').max()
but it shows like this:
| id | 1 | 3 | 4 | 6 |
|---|---|---|---|---|
| 1 | 1 | 1 | 1 | 1 |
how can i insert the middle column (2 and 5) even if it's not in the data?
CodePudding user response:
You can use pd.crosstab
instead, then create new columns using pd.RangeIndex based on the min and max month, and finally use DataFrame.reindex (and optionally DataFrame.reset_index afterwards):
import pandas as pd
new_cols = pd.RangeIndex(df['month'].min(), df['month'].max())
res = (
pd.crosstab(df['id'], df['month'])
.reindex(columns=new_cols, fill_value=0)
.reset_index()
)
Output:
>>> res
id 1 2 3 4 5
0 1 1 0 1 1 0
