I have following 2 dataframes -
df1:
job e_msg
0 f_maf_crm_customer_request_global_jgvcc permission denied for relation f_maf_custome
df2:
master_job error_msg current_count
0 JGCC_Genesys_Conversations_Aggregates
1 f_maf_crm_customer_request_global_jgvcc 100
if df1 is having the same job name so df2 should update error_msg with df1.e_msg and current_count should change as 0.
In final output:
master_job error_msg current_count
0 JGCC_Genesys_Conversations_Aggregates
1 f_maf_crm_customer_request_global_jgvcc permission denied for relation f_maf_custome 0
CodePudding user response:
A simple approach would be this one liner.
df2.loc[df2.master_job.isin(df1.job), ['error_msg', 'current_count']] = [df1['e_msg'].values, 0]
This gives the output you want. Please note that I have abbreviated the error_msg for brevity. You can test it out yourself and see that the output message is the same as the one you are using.
job error_msg current_count
0 JGCC_Genesys_Conversations_Aggregates
1 f_maf_crm_customer_request_global_jgvcc permission denied... 0.0
What that single line of code does is quite simple. It checks if the records in df1 exist in df2 and if it does, then the values in df2 are changed accordingly to match your requirement. You can read more about isin here.
CodePudding user response:
You can try an outer join on the two dataframes from the merge function in pandas. If you don't know what joins are then refer to this webpage.
Code -
merged_df = pd.merge(df1, df2, on = "master_job", how = "outer")
merged_df
Output -
| master_job | error_msg_x | current_count | |
|---|---|---|---|
| 0 | JGCC_Genesys_Conversations_Aggregates | nan | |
| 1 | f_maf_crm_customer_request_global_jgvcc | permission denied for relation f_maf_custome | 100 |
Here we are getting two error_msg columns because error_msg_x is from df1 and error_msg_y is from df2. Because error_msg_y isn't of any use to use, we can drop it using the code below.
merged_df = merged_df.drop(columns = "error_msg_y")
We can merge the two dataframes and drop the error_msg_y column in just one line using this code,
merged_df = pd.merge(df1, df2, on = "master_job", how = "right").drop(columns = "error_msg_y")
