I want to populate the column of a dataframe with values from the column of another dataframe when t-CodePudding

I want to populate values of df2['VALUE'] in a new column in df1 df1['New'] when category and location match for both dataframes.

When

df1['category'] = df2['CATEGORY'] AND df1['location'] = df2['LOCATION']

populate values of df2['VALUE'] into a new row in df1 so that using df1['v1'] I can calculate df1['calculatedfield'] = df1['v1']/df1['new']

df1

category	location	type	v1
A	loc1	1	2
A	loc1	2	4
A	loc2	1	6
A	loc2	2	8
B	loc1	1	10
B	loc1	2	12
B	loc2	1	14
B	loc2	2	16

df2

CATEGORY	LOCATION	VALUE
A	loc1	50
A	loc2	30
B	loc1	70
B	loc2	90

output

category	location	type	v1	new(df2['VALUE'])	calculatedfield(v1/new)
A	loc1	1	2	50	0.04
A	loc1	2	4	50	0.08
A	loc2	1	6	30	0.2
A	loc2	2	8	30	0.27
B	loc1	1	10	70	0.14
B	loc1	2	12	70	0.17
B	loc2	1	14	90	0.16
B	loc2	2	16	90	0.18

df1 has more rows than df2, that is why I didn't go with joining the two dataframes. I need to populate df2['new'] with the values from df2['VALUE'] wherever the combination of category and location come up and regardless of the value of df2['type']. I can't drop rows.

I tried

df1['New'] = np.where((df1['category'] == df2['CATEGORY']) & (df1['location'] == df2['location']), df2['VALUE'], None)

and this came up:

ValueError: Can only compare identically-labeled Series objects

CodePudding user response：

You can use pandas.set_index() function to create a multiIndex dataframe for obtaining values from df2 and use the pandas.reset.index() to reset the dataframe. Please see below my sample code. I have replicated your dataframes df1 and df2.

code:

import pandas as pd

data1 = {'category': ['A', 'A', 'A', 'A','B', 'B', 'B', 'B'], 
        'location': ['loc1', 'loc1', 'loc2', 'loc2','loc1', 'loc1', 'loc2', 'loc2'],
        'type': [1,2,1,2,1,2,1,2,],
         'v1': [2,4,6,8,10,12,14,16]}
data2 = {'category': ['A', 'A','B', 'B'], 
        'location': ['loc1', 'loc2', 'loc1','loc2'],
        'VALUE': [50,30,70,90]}    
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

df_temp = df1.set_index(['category','location'])
df_temp['New'] = df2.set_index(['category','location'])['VALUE']
df1 = df_temp.reset_index()

df1['Calculated_Field'] = df1['v1']/df1['New']
df1

Output:

References:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_index.html

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html#pandas.DataFrame.reset_index