I want to match each instance of character c that is not within a contiguous repeat of c that is N or more characters long.
In my case the character to match, c, is " and the number of repeats, N, is 3. Specifically, I want to know a single regex solution for matching a lone " character and both characters in "", but no characters in """. For the sake of my task it's easiest to do this with a few separate patterns, but I'm curious in learning a solution that'll teach me more about regex. It would be great if the solution could scale for values of N greater than 3.
In this example, N=3 and c=":
"OK"
""
"""OK"""
The first four instances of " should be the only matches (no " on the third line should match).
The closest I've come is from using negative lookbehinds/lookaheads: (?<!"{2})"(?!"{2}). This doesn't exclude the character in the middle of the repeated stretch however. It doesn't scale for other values of N either.
Any insights would be much appreciated!
CodePudding user response:
You can use lookaheads and lookbehinds, but you need to look for "anything that's not a quote", surrounding the expression you allow for a quote:
(?<!")""?(?!")
For the more general case where you want between M and N quotes, and M > 1 or N > 2, use
(?<!")"{M,N}(?!")
