I need to match all strings that contain one word of a list, but only if that word is not immediately preceded by another specific word. I have this regex:
.*(?<!forbidden)\b(word1|word2|word3)\b.*
that is still matching a sentence like hello forbidden word1 because forbidden is matched by .*. But if I remove the .* I am not anymore matching strings like hello word1, which I want to match.
Note that I want to match a string like forbidden hello word1.
Could you suggest me how to fix this problem?
CodePudding user response:
This one seems to work well :
^.*\b(?!(?:forbidden|word[1-3])\b)\w (word[1-3]).*$
\b(?!(?:forbidden|word[1-3])\b)\w checks for multiple following words that are not forbidden or word[1-3].
So it matches hi forbidden hello word1 test but not hi hello forbidden word2 test.
CodePudding user response:
If what you want is match entire string. Try this:
^(.(?<!forbidden (word1|word2|word3)\b))*((?<!forbidden )\b(word1|word2|word3)\b) (.(?<!forbidden (word1|word2|word3)\b))*$
The knowledge is from this thread Regular expression to match a line that doesn't contain a word
I've just reversed the order of look-around
^(.(?<!forbidden (word1|word2|word3)\b))* to discard any string that has pattern forbidden (word1|word2|word3)
((?<!forbidden )\b(word1|word2|word3)\b) is what you defined
But I just can't understand why do you need this requirement.
CodePudding user response:
Have a look into word boundaries \bword can never touch a word character to the left.
To disallow (word1|word2|word3) if not preceded by forbidden and
one non word character
\W.*?\b(?<!forbidden\W)(word1|word2|word3)\b.*any amount of
\W.*?(?<!forbidden)(?<!\W)\W*\b(word1|word2|word3)\b.*Regex101 demo (in multiline demo I used
[^\w\n]instead\Wfor not skipping over lines)
