I am looking up the prices of products based on id_number in another DataFrame called df and adding the prices of each product in my df_pair DataFrame. I am doing this right now in the following way:
df_pair['price_p1'] = df_pair['p1'].progress_apply(lambda x: df[df['id_number'] == x]['price'].iloc[0])
But it seems, it's too slow. Is there any other better solution to solve this?
My two DataFrame sample:
df:
| id_number | price | ... |
|---|---|---|
| B0000D9MYM | 12.5 | |
| B0000D9MYL | 25.26 | |
| B00ADHIGBA | 8.9 | |
| B00H9OX598 | 10.56 | |
| ... | .... |
df_pair looks like below:
| p1 | ... |
|---|---|
| B002HQCWYM | |
| B00H9OX598 | |
| B0000D9MYL | |
| ... |
Note: the indices or the order of the id_numbers and p1 values are not the same.
CodePudding user response:
df_pair.merge(df.rename(columns={"id_number":"p1"}), how="left")
you can merge 2 dataframes by merge func but need same column name. (or use left_on & right_on)
my eng is not good. so hard to exlain long. if you have more question. tell me.
if df has many columns, slice df and use df[["id_number", "price"]] instead of df
and how can i make grey block on df? i don want make bold anymore
CodePudding user response:
Assuming id_number is the index of df you can do something like:
df_pair.join(df, on='p1')['price'].sum()
A join is a standard database operation that basically merges two data frames by linking a column of one to the index of other.
CodePudding user response:
Another option is to set_index with "id_number" and map the price:
df_pair['price'] = df_pair['p1'].map(df1.set_index('id_number')['price'])
Output:
p1 ... price
0 B002HQCWYM ... NaN
1 B00H9OX598 ... 10.56
2 B0000D9MYL ... 25.26
3 ...
