I have 2 dataframes of different sizes. The first dataframe(df1) has 4 columns, but two of those columns have the same name as the columns in the second dataframe(df2), which is only comprised of 2 columns. The columns in common are ['ID'] and ['Department'].
I want to check if any ID from df2 are in df1. If so, I want to replace df1['Department'] value with df2['Department'] value.
For instance, df1 looks something like this:
ID Department Yrs Experience Education
1234 Science 1 Bachelors
2356 Art 3 Bachelors
2456 Math 2 Masters
4657 Science 4 Masters
And df2 looks something like this:
ID Department
1098 P.E.
1234 Technology
2356 History
I want to check if the ID from df2 is in df1 and if so, update Department. The output should looks something like this:
ID Department Yrs Experience Education
1234 **Technology** 1 Bachelors
2356 **History** 3 Bachelors
2456 Math 2 Masters
4657 Science 4 Masters
The expected updates to df1 are in bold
Is there an efficient way to do this?
Thank you for taking the time to read this and help.
CodePudding user response:
You can use ID of df1 to map with the Pandas series formed by setting ID on df2 as index and taking the column of Department from df2 (this acts as a mapping table).
Then, in case of no match of ID from df2, we fill-in the original values of Department from df1 (to retain original values in case of no match):
df1['Department'] = (df1['ID'].map(df2.set_index('ID')['Department'])
.fillna(df1['Department'])
)
Result:
print(df1)
ID Department Yrs Experience Education
0 1234 Technology 1 Bachelors
1 2356 History 3 Bachelors
2 2456 Math 2 Masters
3 4657 Science 4 Masters
CodePudding user response:
Try:
df1["Department"].update(
df1[["ID"]].merge(df2, on="ID", how="left")["Department"]
)
print(df1)
Prints:
ID Department Yrs Experience Education
0 1234 Technology 1 Bachelors
1 2356 History 3 Bachelors
2 2456 Math 2 Masters
3 4657 Science 4 Masters
CodePudding user response:
df_1 = pd.DataFrame(data={'ID':[1234, 2356, 2456, 4657], 'Department':['Science', 'Art', 'Math', 'Science']})
df_2 = pd.DataFrame(data={'ID':[1234, 2356], 'Department':['Technology', 'History']})
df_1.loc[df_1['ID'].isin(df_2['ID']), 'Department'] = df_2['Department']
OutPut
ID Department
0 1234 Technology
1 2356 History
2 2456 Math
3 4657 Science
