I have a term, let's say the dog and an example set of predefined tags: red and big. I'm trying to write a regexp that will match valid strings - those that have any combination of the tags where tag might be met zero or one time. Tags order does not matter.
Examples of strings that should match:
dog
red dog
red big dog
big red dog
Examples of strings that should not match:
red red dog
red big red dog
small red dog
The direct approach with just enumerating all possible combinations is a nightmare with dozens of terms.
This is where i've stopped for now:
/
(?: # group for repetition
(
red\s | big\s # a tag that ...
)(?! \1 ) # ... is not followed by itself
# > (replacing backref with a recusional backref
# > still doesn't work,
# > changing negative lookahead by a positive
# > still gives same undesired match on invalid strings)
){0,2} # such a term repeated 0 to [amount of terms] times
dog # followed by a 'dog'
/xs
This regexp matches all the strings, which is undesired.
CodePudding user response:
You may use this regex:
^(?!.*\b(big|red)\h.*\b\1\b)(?:big\h |red\h )*dog$
RegEx Details:
^: Start^(?!.*\b(big|red)\h.*\b\1\b): Fail the match if any of the keywords appear more than once(?:big\h |red\h )*: Match 0 or more ofbigorredwords followed by 1 whitespacedog: Matchdog$: End
