Writing a regex expression that finds 'zz' in a word but not at the start and the end-CodePudding

I am having some difficulty writing a regex expression that finds words in a text that contain zz, but not at the start and the end of the text. These are two of my many attempts:

pattern = re.compile(r'(?!(?:z){2})[a-z]*zz[a-z]*(?!(?:z){2})')
pattern = re.compile(r'\b[^z\s\d_]{2}[a-z]*zz[a-y][a-z]*(?!(?:zz))\b')

Thanks

CodePudding user response：

Well, the direct translation would be

\b(?!zz)(?:(?!zz\b)\w) zz(?:(?!zz\b)\w) \b

See a demo on regex101.com.

Programmatically, you could use

text = "lorem ipsum buzz mezzo mix zztop but this is all"

words = [word 
         for word in text.split()
         if not (word.startswith("zz") or word.endswith("zz")) and "zz" in word]

print(words)

Which yields

['mezzo']

See a demo on ideone.com.

CodePudding user response：

Another idea to use non word boundaries.

\B matches at any position between two word characters as well as at any position between two non-word characters ...

\w*\Bzz\B\w*

See this demo at regex101

Be aware that above matches words with two or more z. For exactly two:

\w*(?<=[^\Wz])zz(?=[^\Wz])\w*

Another demo at regex101

Use any of those patterns with (?i) flag for caseless matching if needed.

CodePudding user response：

You can use lookarounds:

\b(?!zz)\w ?zz\w \b(?<!zz)

demo

or not:

\bz?[^\Wz]\w*?zz\w*[^\Wz]z?\b

demo

Limited to ASCII letters this last pattern can also be written:

\bz?[a-y][a-z]*?zz[a-z]*[a-y]z?\b

CodePudding user response：

Your criteria just means that the first and last letter cannot be z. So we simply have to make sure the first and last letter is not z, and then we have a zz somewhere in the text.

Something like

^[^z].*zz.*[^z]$

should work