Home > Enterprise >  Function to extract substring from a string with multiple delimiters - python
Function to extract substring from a string with multiple delimiters - python

Time:01-25

I have a column with string that contains delimiters and I would like to create a function to extract substring only for the string that contains the delimiters

Current

EMAIL               TITLE
[email protected]   Marketing Analyst
[email protected]     501.Software Engineer.MG3 
[email protected]     Product Researcher
[email protected]    Managing Director
[email protected]    64.Legal Consultant.I44
[email protected]    Hardware Analyst.

I would like to extract the substring in between the "." delimiters only for the string with delimiters. Else, the text should remain the same.

EMAIL               TITLE                       NEW_TITLE
[email protected]   Marketing Analyst           Marketing Analyst
[email protected]     501.Software Engineer.MG3   Software Engineer
[email protected]     Product Researcher          Product Researcher
[email protected]    Managing Director           Managing Director 
[email protected]    64.Legal Consultant.I44     Legal Consultant
[email protected]    Hardware Analyst.           Hardware Analyst.

I have tried to create a function with the following code but it does not seem to be working

def clean_title(text):
    match = re.search(r"\.(.*?)\.", text)
    if match:
        return match.group(1)
    else:
        return text

df['NEW_TITLE'] = df['TITLE'].apply(clean_title)

appreciate any form of help, thank you!

CodePudding user response:

You can use a replacing approach:

df['NEW_TITLE'] = df['TITLE'].str.replace(r'^[^.]*\.([^.] )\..*', r'\1', regex=True)

See the regex demo. The regex matches all occurrences of

  • ^ - start of string
  • [^.]* - zero or more non-dot chars
  • \. - a dot
  • ([^.] ) - Group 1: one or more non-dot chars
  • \. - a dot
  • .* - the rest of the line (any zero or more chars other than line break chars as many as possible)

And replaces with Group 1 value.

  •  Tags:  
  • Related