I'm trying to merge one column values from df2 to df1. df1.merge(df2, how='outer') seems to be what I needed but result is not what I wanted because of duplicate. Using 'on' introduces _x and _y which I don't want either.
In below Example: sub=site1 in both df1 and df2 is same, then 'fred' from df2 replaces 'own' of df1.
# Pandas Merge test:
import pandas as pd
df1 = pd.DataFrame({'sub': ['site1', 'site2', 'site3'], 'iss': ['enc1', 'enc2', 'enc3'], 'rem': [1, 3, 5], 'own': ['andy', 'brian', 'cody']})
df2 = pd.DataFrame({'sub': ['data1', 'data2', 'site1'], 'rem': [2, 4, 6], 'own': ['david', 'edger', 'fred']})
>>> df1
sub iss rem own
0 site1 enc1 1 andy
1 site2 enc2 3 brian
2 site3 enc3 5 cody
>>> df2
sub rem own
0 data1 2 david
1 data2 4 edger
2 site1 6 fred
>>> df1.merge(df2, how='outer')
sub iss rem own
0 site1 enc1 1 andy
1 site2 enc2 3 brian
2 site3 enc3 5 cody
3 data1 NaN 2 david
4 data2 NaN 4 edger
5 site1 NaN 6 fred
>>> df1.merge(df2, on='sub', how='outer')
sub iss rem_x own_x rem_y own_y
0 site1 enc1 1.0 andy 6.0 fred
1 site2 enc2 3.0 brian NaN NaN
2 site3 enc3 5.0 cody NaN NaN
3 data1 NaN NaN NaN 2.0 david
4 data2 NaN NaN NaN 4.0 edger
Expected Output:
sub iss rem own
0 site1 enc1 1 fred
1 site2 enc2 3 brian
2 site3 enc3 5 cody
3 data1 NaN 2 david
4 data2 NaN 4 edger
CodePudding user response:
A potential somewhat simple solution using pd.concat and loc to filter df1 to just contain records not present in df2 and then concat them together.
# used to make use loc on index as it is a bit simpler.
df1 = df1.set_index('sub')
df2 = df2.set_index('sub')
Then pd.concat them together.
df3 = pd.concat([df1[~df1.index.isin(df2.index)],df2])
Output:
print(df3)
iss rem own
sub
site2 enc2 3 brian
site3 enc3 5 cody
data1 NaN 2 david
data2 NaN 4 edger
site1 NaN 6 fred
This does not change the value of rem and iss for site1 to equal the value of df1 though.
If that is also needed you would you could just add an additional loc statement as a possible solution. Like this:
df3.loc[(df3.index.isin(df1.index.to_list())) & ~(df3['rem'].isin(df1['rem'].to_list())), ['iss','rem']] = df1[['iss','rem']]
Final Output
iss rem own
sub
site2 enc2 3 brian
site3 enc3 5 cody
data1 NaN 2 david
data2 NaN 4 edger
site1 enc1 1 fred
CodePudding user response:
Edit: changed to using update instead of fillna as per @bkeesey's comment
you need to merge on sub then update the new columns and drop the old ones
try
import pandas as pd
df1 = pd.DataFrame({'sub': ['site1', 'site2', 'site3'], 'iss': ['enc1', 'enc2', 'enc3'], 'rem': [1, 3, 5], 'own': ['andy', 'brian', 'cody']})
df2 = pd.DataFrame({'sub': ['data1', 'data2', 'site1'], 'rem': [2, 4, 6], 'own': ['david', 'edger', 'fred']})
dfm = df1.merge(df2, on='sub', how='outer', suffixes=["_x",""])
dfm.own.update(dfm.own_x)
dfm.rem.update(dfm.rem_x)
del dfm["own_x"]
del dfm["rem_x"]
result
sub iss rem own
0 site1 enc1 6.0 fred
1 site2 enc2 3.0 brian
2 site3 enc3 5.0 cody
3 data1 NaN 2.0 david
4 data2 NaN 4.0 edger
CodePudding user response:
here is one way to do it
# update the df1.own with the values for it in the df2
# using map
df1['own'] = df1['sub'].map(df2.set_index('sub')['own']).fillna(df1['own'])
out=(pd.concat([df1, df2]) # concat the two DF
.drop_duplicates(subset=['sub']) # drop duplicates
.reset_index() # reset index
.drop(columns='index')) # remove the unwanted column
out
sub iss rem own
0 site1 enc1 1 fred
1 site2 enc2 3 brian
2 site3 enc3 5 cody
3 data1 NaN 2 david
4 data2 NaN 4 edger
alternately,
# merge the two DF, and drop the duplicates
out=(pd.concat([df1, df2])
.drop_duplicates(subset=['sub'])
.reset_index()
.drop(columns='index'))
# map the own in the resulting DF from concat
out['own'] = out['sub'].map(df2.set_index('sub')['own']).fillna(out['own'])
out
sub iss rem own
0 site1 enc1 1 fred
1 site2 enc2 3 brian
2 site3 enc3 5 cody
3 data1 NaN 2 david
4 data2 NaN 4 edger
