Home > Back-end >  Joining two DataFrames with Pandas, one with 1 row per key, and the other with several rows per key
Joining two DataFrames with Pandas, one with 1 row per key, and the other with several rows per key

Time:01-11

First, I want to point out that I didn't found the answer for my question here in stackoverflow nor in pandas documentation, so, if the question had been asked before, I'd appreciate a link for that thread.

I want to join two DataFrames as follows.

df1 =

key   x   y   z
0    x0  y0  z0
1    x1  y1  z1
...
10  x10 y10 z10

df2 =

key w   v   u
0   w0  v0  u0
0   w0  v0  u0
0   w0  v0  u0
1   w1  v1  u1
1   w1  v1  u1
2   w2  v2  u2
3   w3  v3  u3
...
10  w10 v10 u10
10  w10 v10 u10

desired_df_output =

key  x   y   z   w   v   u
0   x0  y0  z0  w0  v0  u0
1   x1  y1  z1  w1  v1  u1
...
10 x10 y10 z10 w10 v10 u10

I've tried this df1.join(df2, how='inner', on='key'), but I get this error: TypeError: object of type 'NoneType' has no len().

Thanks

CodePudding user response:

It seems df2 has duplicates values, so if you drop them using drop_duplicates method and merge with df1 from the right side, you get the desired outcome.

out = df1.merge(df2.drop_duplicates(), on='key')

Output:

  key    x    y    z    w    v    u
0   0   x0   y0   z0   w0   v0   u0
1   1   x1   y1   z1   w1   v1   u1
2  10  x10  y10  z10  w10  v10  u10

CodePudding user response:

import pandas as pd

df1 = pd.DataFrame({'k':[0, 1, 2, 3],
                    'x':['x0', 'x1', 'x2', 'x3'],
                    'y':['y0', 'y1', 'y2', 'y3'],
                    'z':['z0', 'z1', 'z2', 'z3']
                   })
df1.set_index('k', inplace=True)

df2 = pd.DataFrame({'k':[0, 0, 0, 1, 1, 1],
                    'v':['v0', 'v0', 'v0','v1', 'v1', 'v1',],
                    'w':['w0', 'w0', 'w0','w1', 'w1', 'w1',],
                    'u':['u0', 'u0', 'u0','u1', 'u1', 'u1',]
                   })
df2.set_index('k', inplace=True)

df_merged = pd.merge(df1, df2.drop_duplicates(), how='inner', left_index=True, right_index=True)

df_merged

    x   y   z   v   w   u
k                       
0   x0  y0  z0  v0  w0  u0
1   x1  y1  z1  v1  w1  u1
  •  Tags:  
  • Related