import pandas as pd
df = pd.read_csv("product_2022-02-10.csv")
df["primary_category_name_en"]
Result:
primary_category_name_en
Gadgets & Electronics > Desktop Computer & Computer Accessories > Connecting Cable & Unit Convertor > Lan Line
Electrical Appliances > Other Small Home Appliances & Accessories > Drill & Extension Cord & Battery > Battery & Battery Charger > Other Battery
Housewares > Hardware & Tools > Metal Tools & Hardware Tools > Switch
Electrical Appliances > Smart Home > Smart Extension Cord
How can I split by > and put it back to different columns?
I know how to split it. df["primary_category_name_en”].str.split(">")
There are some only 3 categories in a row / 4 categories in a row / 5 categories in a row. I don't know how to solve the problem in this.
For example:
ROW 1:
df["Primary_category_1"] = "Gadgets & Electronics"
df["Primary_category_2"] = "Desktop Computer & Computer Accessories"
df["Primary_category_3"] = "Connecting Cable & Unit Convertor"
df["Primary_category_4"] = "Lan Line"
ROW 2:
df["Primary_category_1"] = "Electrical Appliances"
df["Primary_category_2"] = "Other Small Home Appliances & Accessories"
df["Primary_category_3"] = "Drill & Extension Cord & Battery"
df["Primary_category_4"] = "Battery & Battery Charger"
df["Primary_category_5"] = "Other Battery"
ROW 3:
...
ROW4:
...
CodePudding user response:
Try using .str.split with expand=True:
split = df['primary_category_name_en'].str.split(' > ', expand=True).pipe(lambda x: x.set_axis(x.columns 1, axis=1)).add_prefix('Primary_category_').fillna('')
Output:
>>> split
Primary_category_1 Primary_category_2 Primary_category_3 Primary_category_4 Primary_category_5
0 Gadgets & Electronics Desktop Computer & Computer Accessories Connecting Cable & Unit Convertor Lan Line
1 Electrical Appliances Other Small Home Appliances & Accessories Drill & Extension Cord & Battery Battery & Battery Charger Other Battery
2 Housewares Hardware & Tools Metal Tools & Hardware Tools Switch
3 Electrical Appliances Smart Home Smart Extension Cord
To add it back to your original dataframe, use pd.concat:
df = pd.concat([df, split], axis=1)
(Note that your code stores the dataframe in the variable pd which overwrites the pandas import. You should do df = pd.read_csv(...) instead of pd = pd.read_csv(...).)
