I am having some difficulty writing a regex expression that finds words in a text that contain zz, but not at the start and the end of the text. These are two of my many attempts:
pattern = re.compile(r'(?!(?:z){2})[a-z]*zz[a-z]*(?!(?:z){2})')
pattern = re.compile(r'\b[^z\s\d_]{2}[a-z]*zz[a-y][a-z]*(?!(?:zz))\b')
Thanks
CodePudding user response:
Well, the direct translation would be
\b(?!zz)(?:(?!zz\b)\w) zz(?:(?!zz\b)\w) \b
Programmatically, you could use
text = "lorem ipsum buzz mezzo mix zztop but this is all"
words = [word
for word in text.split()
if not (word.startswith("zz") or word.endswith("zz")) and "zz" in word]
print(words)
Which yields
['mezzo']
See a demo on ideone.com.
CodePudding user response:
Another idea to use non word boundaries.
\Bmatches at any position between two word characters as well as at any position between two non-word characters ...
\w*\Bzz\B\w*
Be aware that above matches words with two or more z. For exactly two:
\w*(?<=[^\Wz])zz(?=[^\Wz])\w*
Use any of those patterns with (?i) flag for caseless matching if needed.
CodePudding user response:
You can use lookarounds:
\b(?!zz)\w ?zz\w \b(?<!zz)
or not:
\bz?[^\Wz]\w*?zz\w*[^\Wz]z?\b
Limited to ASCII letters this last pattern can also be written:
\bz?[a-y][a-z]*?zz[a-z]*[a-y]z?\b
CodePudding user response:
Your criteria just means that the first and last letter cannot be z. So we simply have to make sure the first and last letter is not z, and then we have a zz somewhere in the text.
Something like
^[^z].*zz.*[^z]$
should work
