How to remove a special character based on negative pattern matching using regular expression-CodePudding

I have a sample string of something like hello \ \\\world \ \\\\ this \234 \ is \Pattern\ and I want it to be something like hello \world this 234 is \Pattern

One way to do is to run a loop for every character in the string and if it's a \ and next character is NOT a word, then replace it with a space. Simple but inefficient code. There must be another way to do it using regular expression.

I can find all the \alphabet using r'\\\w ' and any single \ followed by space as \\\s but these won't take \\\ \( \ into consideration. How can this be done?

CodePudding user response：

Maybe use:

\\(?![A-Za-z])\s*

And replace with empty string as per this online demo

\\ - A backslash (escaped);
(?![A-Za-z]) - Negative lookahead to assert not being followed by alphachar;
\s* - 0 (Greedy) whitespace-chars.

CodePudding user response：

Try this regex:

\\(?=[\W\d]|$)

Substitute all the matches with an empty string

Click for Demo

Code

Explanation

\\ - matches \
(?=[\W\d]|$) - positive lookahead to make sure that the \ matched above must either be followed by a digit or a non-word or must be at the end of the string. All such matched \ are to be replaced by empty string

CodePudding user response：

You can use a lookahead:

s = r"hello \  \\\world \  \\\\ this  \234 \ is \pattern\'"

import re
s2 = re.sub(r'\\*(?![a-zA-Z])', '', s)
print(s2)

output: hello \world this 234 is \pattern'

How the regex works:

\\*          # match any number of \
(?![a-zA-Z]) # if not followed by a letter