I have two dataframes in the following form:
df1
| id | name | df2_id |
|---|---|---|
| one | foo | template_x |
| two | bar | template_y |
| three | baz | template_z |
df2
| id | name | value |
|---|---|---|
| template_x | aaa | zzz |
| template_x | bbb | yyy |
| template_y | ccc | xxx |
| template_y | ddd | www |
| template_z | eee | vvv |
| template_z | fff | uuu |
For each value in df1 where df2_id == df2.id, I'd like to iterate over df2 and append the value of df1.id to name and value in each row to get:
df3
| id | concat_name | concat_val |
|---|---|---|
| template_x | aaa_one | zzz_one |
| template_x | bbb_one | yyy_one |
| template_y | ccc_two | xxx_two |
| template_y | ddd_two | www_two |
| template_z | eee_three | vvv_three |
| template_z | fff_three | uuu_three |
Constraints/caveats:
- All relevant values are strings, no integers.
- Sometimes
df2.valueis empty, and I would like to keep it empty.
My approach was to use nested for loop with df.iterrows, but it's giving me trouble.
CodePudding user response:
Seems like you can merge the DataFrames and add relevant columns together:
merged = df1[['id','df2_id']].merge(df2, left_on='df2_id', right_on='id', suffixes=('_',''))
merged['name'] = '_' merged['id_']
merged['value'] = '_' merged['id_']
merged = merged.drop(columns=['id_', 'df2_id']).rename(columns={'name':'concat_name', 'value':'concat_val'})
Output:
id concat_name concat_val
0 template_x aaa_one zzz_one
1 template_x bbb_one yyy_one
2 template_y ccc_two xxx_two
3 template_y ddd_two www_two
4 template_z eee_three vvv_three
5 template_z fff_three uuu_three
CodePudding user response:
Here's a pretty sleek one-liner:
df2[['name', 'value']] = df2[['name', 'value']].T.add('_' df1.set_index('df2_id').loc[df2['id'], 'id'].reset_index(drop=True)).T
Output:
>>> df2
id name value
0 template_x aaa_one zzz_one
1 template_x bbb_one yyy_one
2 template_y ccc_two xxx_two
3 template_y ddd_two www_two
4 template_z eee_three vvv_three
5 template_z fff_three uuu_three
