I think it is very easy and simple question. but it is very difficult for me. please help! I can write R code
b$v2 <- ifelse(a$key==b$key, a$v2, b$v2)
input df a is a df, it have key and answer value. b is a df, it have to change according to a.
a1=[11,22,44]
a2=['a','b','d']
a3={'key':a1, 'v2':a2}
a=pd.DataFrame(a3)
b1=[11,22,33,44,55]
b2=['ff','gg','c','hh','e']
b3={'key':b1, 'v2':b2}
b=pd.DataFrame(b3)
final result (what I want)
c1=[11,22,33,44,55]
c2=['a','b','c','d','e']
c3={'key':c1, 'v2':c2}
c=pd.DataFrame(c3)
but I don't know how to write python. I tried.
My concept
first, making True, False
i=1
a['key'].iloc[i]==b['key']
I have 5 bools (F, T, F, F, F)
second, When True value, Change 'v2' col of df b but it is not working
b['v2'] = np.where(a['key'].iloc[i]==b['key'], a['v2'], b['v2'])
error message :
ValueError: operands could not be broadcast together with shapes (5,) (3,) (5,)
I think I have a misunderstanding using 'np.where' function.
CodePudding user response:
we could use Series.map for mapping key -> v2. Set a's key as index using set_index. Now, use it in Series.map
m = a.set_index('key')['v2']
b['v2'] = b['key'].map(m).fillna(b['v2'])
print(b)
# key v2
# 0 11 a
# 1 22 b
# 2 33 c
# 3 44 d
# 4 55 e
CodePudding user response:
Outer merge the two dataframes on 'key' and then fill NaN values with values from b and drop the column from b:
out = a.merge(b, on='key', how='outer', suffixes=['','_y'])
out['v2'].fillna(out['v2_y'], inplace=True)
out.drop('v2_y', axis=1, inplace=True)
Another option is to map 'v2' values from b to NaNs in the merged dataframe:
out = a.merge(b[['key']], on='key', how='outer')
out['v2'].fillna(out['key'].map(b.set_index('key')['v2']), inplace=True)
Output:
key v2
0 11 a
1 22 b
2 44 d
3 33 c
4 55 e
CodePudding user response:
Shortest way out would be to find what of b is not in a and then append it to a.
c=a.append(b[~(b['key'].isin(a['key']))]).sort_values(by=['key'])
Long way out is to use outer merge, fill in missing using combine_first and then drop suffixed columns not wanted
c=b.merge(a, how='outer', on='key', suffixes=('_x',''))
c=c.assign(v2=c['v2'].combine_first(c['v2_x'])).drop(columns=['v2_x'])
key v2
0 11 a
1 22 b
2 33 c
3 44 d
4 55 e
