I am getting NA values when tried to merge two dataframes by common columns; I think it is straightforward to merge two dataframe by using pandas.merge and I should expect correct output; but I am not getting correct output. Here is what I did:
my current attempt:
import pandas as pd
df1=pd.read_csv("https://raw.githubusercontent.com/adamFlyn/test_rl/main/df1.csv", index_col=0)
df2=pd.read_csv("https://raw.githubusercontent.com/adamFlyn/test_rl/main/df2.csv", index_col=0)
merged_df = df1.merge(df2, how='left', left_on=['ST', 'County_Name'], right_on=['ST', 'County_Name'])
by doing so, I want to match County_Name, ST so merge them by left join and having FIPS_Code column as well; but now FIPS_Code become NAN, I am sure County_Name from df1 can match in df2; I did manually checked but I don't get it why FIPS_Code become NAN; I want it left join because I want merged_dataframe has same shape as df1.
Can anyone point me out what went wrong with above join? Is that because of data or joining is not working here? Any thoughts?
CodePudding user response:
There is space in your df2 County_Name column
df2['County_Name'] = df2['County_Name'].str.strip()
merged_df = df1.merge(df2, how='left', left_on=['ST', 'County_Name'], right_on=['ST', 'County_Name'])
print(merged_df)
ST County_Name Orientation state_x Year state_y FIPS_Code
0 IL Adams Defense Illinois 2015 Illinois 17001.0
1 IL Alexander Defense Illinois 2015 Illinois 17003.0
2 IL Bond Defense Illinois 2015 Illinois 17005.0
3 IL Boone Defense Illinois 2015 Illinois 17007.0
4 IL Brown Defense Illinois 2015 Illinois 17009.0
.. .. ... ... ... ... ... ...
97 IL Whiteside Defense Illinois 2015 Illinois 17195.0
98 IL Will Neutral Illinois 2015 Illinois 17197.0
99 IL Williamson Defense Illinois 2015 Illinois 17199.0
100 IL Winnebago Defense Illinois 2015 Illinois 17201.0
101 IL Woodford Defense Illinois 2015 Illinois 17203.0

