I have a dataset catalog with 3 columns: product id, brand name and product class.
import pandas as pd
catalog = {'product_id': [1, 2, 3, 1, 2, 4, 3, 5, 6],
'brand_name': ['FW', 'GW', 'FK','FW','GW','WU','FK','MU', 'AS'],
'product_class': ['ACCESSORIES', 'DRINK', 'FOOD', 'ACCESSORIES', 'DRINK', 'FURNITURE','FOOD', 'ELECTRONICS', 'APPAREL']}
df = pd.DataFrame(data=catalog)
Assume I have a list of product id prod = [1,3,4]. Now, with Python, I want to list all the brand names corresponding to this list prod based on the product_id. How can I do this using only groupby() and get_group() functions? I can do this using pd.DataFrame() combined with the zip() function, but it is too inefficient, as I would need to obtain each column individually.
Expected output (in dataframe)
Product_id Brand_name
1 'FW'
3 'FK'
4 'WU'
Can anyone give some help on this?
CodePudding user response:
You can use pandas functions isin() and drop_duplicates() to achieve this:
prod = [1,3,4]
print(df[df.product_id.isin(prod)][["product_id", "brand_name"]].drop_duplicates())
Output:
product_id brand_name
0 1 FW
2 3 FK
5 4 WU
