Consider the following two data frames
df1 = pd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]})
df2 = pd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]})
Running
df3 = pd.merge(df1, df2, on='a')
Yields
a b c
0 foo 1 3
But why not the following?
a b c
0 foo 1 3
1 bar 2 -
1 baz - 4
What do I need to tell python to get it to output both rows?
CodePudding user response:
A pandas merge does by default an inner join, if you are familiar with database joins. That means it only returns the rows that have a matching entry in both the left and right dataframe. For you, that is just 'foo'.
You can change that by setting the how argument. If you want all rows from both left, and right set it to outer, if you want to keep all from the left frame set it to leftand if you want to keep all from the right frame set it to right.
CodePudding user response:
pd.merge(df1, df2, on='a', how='outer') will join on matching keys with all non matching keys returned as a new row will NaN filling in the blanks.
try here for an overview of different types of SQL style joins which merge uses as basis.
