If one of my dataframes gives me some info about items:
itemId property_1 property_2 property_n Decision
0 i1 88.90 NaN 0 1
1 i2 87.09 7.653800e 06 0 0
2 i3 78.90 7.623800e 06 1 1
3 i4 93.02 NaN 1 0
...
And the other one gives me some info about how users interacted with the items:
userId itemId Decision
0 u1 i1 0
1 u1 i2 1
2 u2 i1 1
3 u2 i3 0
4 u2 i4 1
5 u3 i5 0
...
I am interested in predicting the Decision, which is easy to do if I work with each dataframe, separately. But can I somehow incorporate the second one into the first one, given that in the second one, each item appears multiple times with different Decisions?
I would like to have something like:
itemId property_1 property_2 property_n u1_decision ... Decision
0 i1 88.90 NaN 0 0 1
1 i2 87.09 7.653800e 06 0 1 0
2 i3 78.90 7.623800e 06 1 NaN 1
4 i4 93.02 NaN 1 NaN 0
...
So each user becomes a column, result in something very sparse. The first question would be whether this makes sense, and the second question would be how do I merge the rows from the second dataframe as columns into the first one (I know how to df.merge on Decision, but this doesn't give me the desired result).
CodePudding user response:
You can pivot the second table like:
df.pivot(index='itemId', columns='userId', values='Decision').reset_index()
Then you can do the merge on itemId.
