use pandas map on a complex column-CodePudding

Here is what my csv looks like

time	cause
23	a / b / c
42	c / d / a / b
12	a / d / e
98	c / b / e / d

and this is the output I am trying to achieve

time	a	b	c	d	e
23	1	1	1	0	0
42	1	1	1	1	0
12	1	1	0	0	1
98	0	1	1	1	1

My real data is much larger, but this example should get me what I am looking for. I can not figure out how to use the map function to check for multiple possible values in every cell.

CodePudding user response：

You can use str.get_dummies and join back to the original dataframe:

df[['time']].join(df['cause'].str.get_dummies(sep=' / '))

or using pop for modification of the original dataframe:

df = df.join(df.pop('cause').str.get_dummies(sep=' / '))

output:

   time  a  b  c  d  e
0    23  1  1  1  0  0
1    42  1  1  1  1  0
2    12  1  0  0  1  1
3    98  0  1  1  1  1