Suppose that I have a text like below:
Lorem-Ipsum is simply dummy text of-the printing and typesetting industry. abc123-xyz 1abcc-xy-ef apple.pear-banana asdddd-abc-cba
I want to replace - with a whitespace if it is between alphabetic characters (letters) plus - ([a-zA-Z-]) until whitespace before/after. So, the result should be:
Lorem Ipsum is simply dummy text of the printing and typesetting industry. abc123-xyz 1abcc-xy-ef apple.pear-banana asdddd abc cba
I tried:
\b(?<=[a-zA-Z] )\-(?=[a-zA-Z] )\b
This is not valid since lookbehind does not allow quantifiers, and I guess even if it worked it wouldn't cover all scenarios.
Is there a way to use variable length lookbehinds, or is there any other way for this case?
Edit: Using Python re library
CodePudding user response:
You can use
re.sub(r'(?<!\S)[a-zA-Z] (?:-[a-zA-Z] ) (?!\S)', lambda x: x.group().replace('-', ' '), text)
The regex matches whitespace-separated letter-only words with at least one - in them. Then, all hyphens are replaced with spaces inside the matches.
See the regex demo. Details:
(?<!\S)- left-hand whitespace boundary[a-zA-Z]- one or more ASCII letters(?:-[a-zA-Z] )- one or more occurrences of a-char and then one or more ASCII letters(?!\S)- right-hand whitespace boundary.
Replace [a-zA-Z] with [^\W\d_] to match any Unicode letter words.
See the Python demo:
import re
text = r"Lorem-Ipsum is simply dummy text of-the printing and typesetting industry. abc123-xyz 1abcc-xy-ef apple.pear-banana asdddd-abc-cba"
print(re.sub(r'(?<!\S)[a-zA-Z] (?:-[a-zA-Z] ) (?!\S)', lambda x: x.group().replace('-', ' '), text))
Output:
Lorem Ipsum is simply dummy text of the printing and typesetting industry. abc123-xyz 1abcc-xy-ef apple.pear-banana asdddd abc cba
