Home > Software engineering >  Parsing a pandas DataFrame by delimiter
Parsing a pandas DataFrame by delimiter

Time:02-10

I have the following problem:

I have a pandas DataFrame as follows:

      df 

       A                    B 

      GRA                 x - kuaiau;;a; y - kaj;jsuik;;;ou; yy -ll'jkusj;;l;
      GRB                 xx -iusiksu;;a; z - kuayatik;;;ou; 
      GRC                 tt - hay;hayh; 
      GRD                 NA
      GRE                 
      

So what I want is whenever column B is not null, split the column B strings based on delimiter '-' and put the groups in 2 columns as shown.

Expected output:

  df_final

       A                    C                            D
      GRA                  x y yy              kuaiau;;a; kaj;jsuik;;;ou; ll'jkusj;;l;
      GRB                  xx z                iusiksu;;a; kuayatik;;;ou; 
      GRC                  tt                  hay;hayh; 
      GRD                  NA                  NA
      GRE                 
      

I am able to split the column based on delimiter '-'. However since I have a varying number of '-' I am not able to concat them properly to get the output as desired and also not able to make it work at all, when the column is NULL, blank, or NA.

An help whatsoever will definitely help.

CodePudding user response:

We can use str.replace here with appropriate regex patterns:

df["C"] = df["B"].str.replace(r'\s*-\s*\S ; \S ; \s*', ' ').str.strip()
df["D"] = df["B"].str.replace(r'\w \s*-\s*', '')
  •  Tags:  
  • Related