This is my dataframe :
| col_1 | col_2 | col_3 | |
|---|---|---|---|
| 1 | A | abc | x |
| 2 | A | abc | y |
| 3 | A | abc | z |
| 4 | B | bcd | x |
| 5 | B | bcd | z |
| 6 | C | cde | x |
| 7 | D | def | y |
and this is what I want :
| col_1 | col_2 | col_x | col_y | col_z | |
|---|---|---|---|---|---|
| 1 | A | abc | 1 | 1 | 1 |
| 4 | B | bcd | 1 | 0 | 1 |
| 6 | C | cde | 1 | 0 | 0 |
| 7 | D | def | 0 | 1 | 0 |
I try with df.pivot but I don't known how insert 0 and 1 and how to drop unused rows at once.
CodePudding user response:
One option is to execute a pd.get_dummies before grouping:
(pd.get_dummies(df, columns=['col_3'], prefix='col')
.groupby(['col_1', 'col_2'], as_index = False)
.sum()
)
col_1 col_2 col_x col_y col_z
0 A abc 1 1 1
1 B bcd 1 0 1
2 C cde 1 0 0
3 D def 0 1 0
CodePudding user response:
Use crosstab with DataFrame.add_prefix and then DataFrame.clip for only 0, 1 values if duplicates in data. If no duplicates is possible remove clip:
df = (pd.crosstab([df['col_1'], df['col_2']], df['col_3'])
.add_prefix('col_')
.clip(upper=1)
.reset_index()
.rename_axis(None, axis=1))
print (df)
col_1 col_2 col_x col_y col_z
0 A abc 1 1 1
1 B bcd 1 0 1
2 C cde 1 0 0
3 D def 0 1 0
