First, I want to point out that I didn't found the answer for my question here in stackoverflow nor in pandas documentation, so, if the question had been asked before, I'd appreciate a link for that thread.
I want to join two DataFrames as follows.
df1 =
key x y z
0 x0 y0 z0
1 x1 y1 z1
...
10 x10 y10 z10
df2 =
key w v u
0 w0 v0 u0
0 w0 v0 u0
0 w0 v0 u0
1 w1 v1 u1
1 w1 v1 u1
2 w2 v2 u2
3 w3 v3 u3
...
10 w10 v10 u10
10 w10 v10 u10
desired_df_output =
key x y z w v u
0 x0 y0 z0 w0 v0 u0
1 x1 y1 z1 w1 v1 u1
...
10 x10 y10 z10 w10 v10 u10
I've tried this df1.join(df2, how='inner', on='key'), but I get this error: TypeError: object of type 'NoneType' has no len().
Thanks
CodePudding user response:
It seems df2 has duplicates values, so if you drop them using drop_duplicates method and merge with df1 from the right side, you get the desired outcome.
out = df1.merge(df2.drop_duplicates(), on='key')
Output:
key x y z w v u
0 0 x0 y0 z0 w0 v0 u0
1 1 x1 y1 z1 w1 v1 u1
2 10 x10 y10 z10 w10 v10 u10
CodePudding user response:
import pandas as pd
df1 = pd.DataFrame({'k':[0, 1, 2, 3],
'x':['x0', 'x1', 'x2', 'x3'],
'y':['y0', 'y1', 'y2', 'y3'],
'z':['z0', 'z1', 'z2', 'z3']
})
df1.set_index('k', inplace=True)
df2 = pd.DataFrame({'k':[0, 0, 0, 1, 1, 1],
'v':['v0', 'v0', 'v0','v1', 'v1', 'v1',],
'w':['w0', 'w0', 'w0','w1', 'w1', 'w1',],
'u':['u0', 'u0', 'u0','u1', 'u1', 'u1',]
})
df2.set_index('k', inplace=True)
df_merged = pd.merge(df1, df2.drop_duplicates(), how='inner', left_index=True, right_index=True)
df_merged
x y z v w u
k
0 x0 y0 z0 v0 w0 u0
1 x1 y1 z1 v1 w1 u1
