I need a Python regex matching the part of a string multiple times:
My String: aa-bbb-c-dd
I would like to have groups like this:
aa-bbbbbb-cc-dd
Does somebody have an idea on how to do this?
CodePudding user response:
You can use lookahead to get overlapping matches:
(?=\b([A-Za-z] -[A-Za-z] )\b)
See the regex demo.
Details:
(?=- start of a positive lookahead that matches a location that is immediately followed with\b- a word boundary([A-Za-z] -[A-Za-z] )- Group 1: one or more ASCII letters,-, one or more ASCII letters\b- a word boundary
)- end of the lookahead.
In Python, use it with re.findall:
import re
text = "aaaa-bb-ccc-dd"
print( re.findall(r'(?=\b([A-Z] -[A-Z] )\b)', text, re.I) )
# => ['aaaa-bb', 'bb-ccc', 'ccc-dd']
See the Python demo. Note I changed [A-Za-z] to [A-Z] in the code since I made the regex matching case insensitive with the help of the re.I option. Make sure you are using the r string literal prefix or \b will be treated as a BACKSPACE char, \x08, and not a word boundary.
Variations
(?=\b([^\W\d_] -[^\W\d_] )\b)- matching any Unicode letters(?=(?<![^\W\d_])([^\W\d_] -[^\W\d_] )(?![^\W\d_]))- matching any Unicode letters and the boundaries are any non-letters
