Home > Software engineering >  Split a string column and put the splits in different columns
Split a string column and put the splits in different columns

Time:02-10

import pandas as pd
df = pd.read_csv("product_2022-02-10.csv")
df["primary_category_name_en"]

Result:

primary_category_name_en
Gadgets & Electronics > Desktop Computer & Computer Accessories > Connecting Cable & Unit Convertor > Lan Line
Electrical Appliances > Other Small Home Appliances & Accessories > Drill & Extension Cord & Battery > Battery & Battery Charger > Other Battery
Housewares > Hardware & Tools > Metal Tools & Hardware Tools > Switch
Electrical Appliances > Smart Home > Smart Extension Cord

How can I split by > and put it back to different columns?

I know how to split it. df["primary_category_name_en”].str.split(">")

There are some only 3 categories in a row / 4 categories in a row / 5 categories in a row. I don't know how to solve the problem in this.

For example:

ROW 1:
df["Primary_category_1"] = "Gadgets & Electronics"
df["Primary_category_2"] = "Desktop Computer & Computer Accessories"
df["Primary_category_3"] = "Connecting Cable & Unit Convertor"
df["Primary_category_4"] = "Lan Line"

ROW 2:
df["Primary_category_1"] = "Electrical Appliances"
df["Primary_category_2"] = "Other Small Home Appliances & Accessories"
df["Primary_category_3"] = "Drill & Extension Cord & Battery"
df["Primary_category_4"] = "Battery & Battery Charger"
df["Primary_category_5"] = "Other Battery"

ROW 3:
...

ROW4:
...

CodePudding user response:

Try using .str.split with expand=True:

split = df['primary_category_name_en'].str.split(' > ', expand=True).pipe(lambda x: x.set_axis(x.columns   1, axis=1)).add_prefix('Primary_category_').fillna('')

Output:

>>> split
      Primary_category_1                         Primary_category_2                 Primary_category_3         Primary_category_4 Primary_category_5
0  Gadgets & Electronics    Desktop Computer & Computer Accessories  Connecting Cable & Unit Convertor                   Lan Line                   
1  Electrical Appliances  Other Small Home Appliances & Accessories   Drill & Extension Cord & Battery  Battery & Battery Charger      Other Battery
2             Housewares                           Hardware & Tools       Metal Tools & Hardware Tools                     Switch                   
3  Electrical Appliances                                 Smart Home               Smart Extension Cord                                  

To add it back to your original dataframe, use pd.concat:

df = pd.concat([df, split], axis=1)

(Note that your code stores the dataframe in the variable pd which overwrites the pandas import. You should do df = pd.read_csv(...) instead of pd = pd.read_csv(...).)

  •  Tags:  
  • Related