change value by key of another dataframe-CodePudding

I think it is very easy and simple question. but it is very difficult for me. please help! I can write R code

b$v2 <- ifelse(a$key==b$key, a$v2, b$v2)

input df a is a df, it have key and answer value. b is a df, it have to change according to a.

a1=[11,22,44]
a2=['a','b','d']
a3={'key':a1, 'v2':a2}
a=pd.DataFrame(a3)

b1=[11,22,33,44,55]
b2=['ff','gg','c','hh','e']
b3={'key':b1, 'v2':b2}
b=pd.DataFrame(b3)

final result (what I want)

c1=[11,22,33,44,55]
c2=['a','b','c','d','e']
c3={'key':c1, 'v2':c2}
c=pd.DataFrame(c3)

but I don't know how to write python. I tried.

My concept

first, making True, False

i=1
a['key'].iloc[i]==b['key']

I have 5 bools (F, T, F, F, F)

second, When True value, Change 'v2' col of df b but it is not working

b['v2'] = np.where(a['key'].iloc[i]==b['key'], a['v2'], b['v2'])

error message :

ValueError: operands could not be broadcast together with shapes (5,) (3,) (5,)

I think I have a misunderstanding using 'np.where' function.

CodePudding user response：

we could use Series.map for mapping key -> v2. Set a's key as index using set_index. Now, use it in Series.map

m = a.set_index('key')['v2']
b['v2'] = b['key'].map(m).fillna(b['v2'])

print(b)
#    key v2
# 0   11  a
# 1   22  b
# 2   33  c
# 3   44  d
# 4   55  e

CodePudding user response：

Outer merge the two dataframes on 'key' and then fill NaN values with values from b and drop the column from b:

out = a.merge(b, on='key', how='outer', suffixes=['','_y'])
out['v2'].fillna(out['v2_y'], inplace=True)
out.drop('v2_y', axis=1, inplace=True)

Another option is to map 'v2' values from b to NaNs in the merged dataframe:

out = a.merge(b[['key']], on='key', how='outer')
out['v2'].fillna(out['key'].map(b.set_index('key')['v2']), inplace=True)

Output:

   key   v2
0   11    a
1   22    b
2   44    d
3   33    c
4   55    e

CodePudding user response：

Shortest way out would be to find what of b is not in a and then append it to a.

c=a.append(b[~(b['key'].isin(a['key']))]).sort_values(by=['key'])

Long way out is to use outer merge, fill in missing using combine_first and then drop suffixed columns not wanted

c=b.merge(a, how='outer', on='key', suffixes=('_x',''))
c=c.assign(v2=c['v2'].combine_first(c['v2_x'])).drop(columns=['v2_x'])



   key v2
0   11  a
1   22  b
2   33  c
3   44  d
4   55  e