Here is what my csv looks like
| time | cause |
|---|---|
| 23 | a / b / c |
| 42 | c / d / a / b |
| 12 | a / d / e |
| 98 | c / b / e / d |
and this is the output I am trying to achieve
| time | a | b | c | d | e |
|---|---|---|---|---|---|
| 23 | 1 | 1 | 1 | 0 | 0 |
| 42 | 1 | 1 | 1 | 1 | 0 |
| 12 | 1 | 1 | 0 | 0 | 1 |
| 98 | 0 | 1 | 1 | 1 | 1 |
My real data is much larger, but this example should get me what I am looking for. I can not figure out how to use the map function to check for multiple possible values in every cell.
CodePudding user response:
You can use str.get_dummies and join back to the original dataframe:
df[['time']].join(df['cause'].str.get_dummies(sep=' / '))
or using pop for modification of the original dataframe:
df = df.join(df.pop('cause').str.get_dummies(sep=' / '))
output:
time a b c d e
0 23 1 1 1 0 0
1 42 1 1 1 1 0
2 12 1 0 0 1 1
3 98 0 1 1 1 1
