I have 2 df:
df1
| X | Y |
|---|---|
| a | c |
| b | d |
df2
| Z | Q |
|---|---|
| i | f |
| j | h |
The number of the rows and columns is undefined.
I need to compare X and Z: when an element in X is equal to an element in Z (suppose a is equal to j), the corrisponding value of a in Y (c) becomes equal to the corrisponding value of j in Q (h).
Like so:
for k in range(0, df1['X']:
for p in range(0, df2['Z']):
if df1.iloc[k]['X'] == df2.iloc[p]['Y']:
df1.at[k,'Y'] = df2.iloc[p]['Q']
Obviously with large dataframes this procedure is unsustainable. Anyone know how to speed everything up? I was reading that numpy offers vectorizations. How could this be done? Thanks for the help!
CodePudding user response:
Simply do:
df1["Y"] = np.where(df1.X == df2.Z, df2.Q, df1.Y)
