Efficient way to lookup values in other dataframe-CodePudding

I am looking up the prices of products based on id_number in another DataFrame called df and adding the prices of each product in my df_pair DataFrame. I am doing this right now in the following way:

df_pair['price_p1'] = df_pair['p1'].progress_apply(lambda x: df[df['id_number'] == x]['price'].iloc[0])

But it seems, it's too slow. Is there any other better solution to solve this? My two DataFrame sample: df:

id_number	price	...
B0000D9MYM	12.5
B0000D9MYL	25.26
B00ADHIGBA	8.9
B00H9OX598	10.56
...	....

df_pair looks like below:

p1	...
B002HQCWYM
B00H9OX598
B0000D9MYL
...

Note: the indices or the order of the id_numbers and p1 values are not the same.

CodePudding user response：

df_pair.merge(df.rename(columns={"id_number":"p1"}), how="left")

you can merge 2 dataframes by merge func but need same column name. (or use left_on & right_on)

my eng is not good. so hard to exlain long. if you have more question. tell me.

if df has many columns, slice df and use df[["id_number", "price"]] instead of df

and how can i make grey block on df? i don want make bold anymore

CodePudding user response：

Assuming id_number is the index of df you can do something like:

df_pair.join(df, on='p1')['price'].sum()

A join is a standard database operation that basically merges two data frames by linking a column of one to the index of other.

CodePudding user response：

Another option is to set_index with "id_number" and map the price:

df_pair['price'] = df_pair['p1'].map(df1.set_index('id_number')['price'])

Output:

           p1  ...  price
0  B002HQCWYM  ...    NaN
1  B00H9OX598  ...  10.56
2  B0000D9MYL  ...  25.26
3         ...