Iterate over pandas dataframe using value in separate dataframe, filtered by shared column-CodePudding

I have two dataframes in the following form:

df1

id	name	df2_id
one	foo	template_x
two	bar	template_y
three	baz	template_z

df2

id	name	value
template_x	aaa	zzz
template_x	bbb	yyy
template_y	ccc	xxx
template_y	ddd	www
template_z	eee	vvv
template_z	fff	uuu

For each value in df1 where df2_id == df2.id, I'd like to iterate over df2 and append the value of df1.id to name and value in each row to get:

df3

id	concat_name	concat_val
template_x	aaa_one	zzz_one
template_x	bbb_one	yyy_one
template_y	ccc_two	xxx_two
template_y	ddd_two	www_two
template_z	eee_three	vvv_three
template_z	fff_three	uuu_three

Constraints/caveats:

All relevant values are strings, no integers.
Sometimes df2.value is empty, and I would like to keep it empty.

My approach was to use nested for loop with df.iterrows, but it's giving me trouble.

CodePudding user response：

Seems like you can merge the DataFrames and add relevant columns together:

merged = df1[['id','df2_id']].merge(df2, left_on='df2_id', right_on='id', suffixes=('_',''))
merged['name']  = '_'   merged['id_']
merged['value']  = '_'   merged['id_']
merged = merged.drop(columns=['id_', 'df2_id']).rename(columns={'name':'concat_name', 'value':'concat_val'})

Output:

           id concat_name concat_val
0  template_x     aaa_one    zzz_one
1  template_x     bbb_one    yyy_one
2  template_y     ccc_two    xxx_two
3  template_y     ddd_two    www_two
4  template_z   eee_three  vvv_three
5  template_z   fff_three  uuu_three

CodePudding user response：

Here's a pretty sleek one-liner:

df2[['name', 'value']] = df2[['name', 'value']].T.add('_'   df1.set_index('df2_id').loc[df2['id'], 'id'].reset_index(drop=True)).T

Output:

>>> df2
           id       name      value
0  template_x    aaa_one    zzz_one
1  template_x    bbb_one    yyy_one
2  template_y    ccc_two    xxx_two
3  template_y    ddd_two    www_two
4  template_z  eee_three  vvv_three
5  template_z  fff_three  uuu_three