How to transform a data frame by grouping by an individual and checking if a feature exists from a l-CodePudding

I have the following issue. I have a data frame like this:

ID	feature
Person_1	18
Person_1	19
Person_1	23
Person_1	59
Person_2	11
Person_2	23
Person_2	59
Person_3	11
Person_3	18
Person_3	1001
Person_3	1239
Person_4	23
Person_4	6531
Person_4	19843
Person_4	200012
……
Person_60	….

Each feature is in a new row. I have a list of features that I could have:

features
11
18
19
23
59
1001
1239
6531
19843
200012

I need the output to be like that:

	11	18	19	23	59	1001	1239	6531	19843	200012
Person_1	0	1	1	1	1	0	0	0	0	0
Person_2	1	0	0	1	1	0	0	0	0	0
Person_3	1	1	0	0	0	1	1	0	0	0
Person_4	0	0	0	1	0	0	0	1	1	1

When each person is in a row, their features are assigned based on the list of features.

I've tried something like this, but it's not even close.

for i in pd.DataFrame[~ df.duplicated(subset=['id'])]:
  for Feature in feature_list:
    if feature_list in df['feature'].unique():
      print('1')
    else:
      print('0')

I'm a bit lost. How to approach the problem could you help me with that?

Thank you very much

CodePudding user response：

There's a number of ways you could do this. Here's one way.

Stating with

df = pd.DataFrame([
    ["Person_1", 1],
    ["Person_1", 2],
    ["Person_2", 1],
    ["Person_3", 3],
], columns=["ID", "feature"])

which looks like

         ID  feature
0  Person_1        1
1  Person_1        2
2  Person_2        1
3  Person_3        3

you should use a groupby and unstack:

df = df.groupby(["ID", "feature"]).size().unstack(fill_value=0).reset_index()

which yields

feature        ID  1  2  3
0        Person_1  1  1  0
1        Person_2  1  0  0
2        Person_3  0  0  1