I have the following problem:
I have a pandas DataFrame as follows:
df
A B
GRA x - kuaiau;;a; y - kaj;jsuik;;;ou; yy -ll'jkusj;;l;
GRB xx -iusiksu;;a; z - kuayatik;;;ou;
GRC tt - hay;hayh;
GRD NA
GRE
So what I want is whenever column B is not null, split the column B strings based on delimiter '-' and put the groups in 2 columns as shown.
Expected output:
df_final
A C D
GRA x y yy kuaiau;;a; kaj;jsuik;;;ou; ll'jkusj;;l;
GRB xx z iusiksu;;a; kuayatik;;;ou;
GRC tt hay;hayh;
GRD NA NA
GRE
I am able to split the column based on delimiter '-'. However since I have a varying number of '-' I am not able to concat them properly to get the output as desired and also not able to make it work at all, when the column is NULL, blank, or NA.
An help whatsoever will definitely help.
CodePudding user response:
We can use str.replace here with appropriate regex patterns:
df["C"] = df["B"].str.replace(r'\s*-\s*\S ; \S ; \s*', ' ').str.strip()
df["D"] = df["B"].str.replace(r'\w \s*-\s*', '')
