Home > Enterprise >  Regex capture a number of words but exclude if another set of words is present
Regex capture a number of words but exclude if another set of words is present

Time:01-06

I am trying to do a regex formula to match when two words are present google and apple but exclude facebook and twitter but I am having loads of problems

I tried this

^(?!.*(facebook|twitter)).*

But I also want to exclude anything is not google or apple

expected output:

'https://facebook/google'  --> Exclude
'https://twitter/google'   --> Exclude
'https://google/foo'       --> Include
'https://apple/'           --> Include
'https://twitter/apple'    --> Exclude
'https://nomatch'          --> Exclude

CodePudding user response:

I'm thinking maybe it's good to check for exact words to avoid possible false negatives (e.g: 'apple' in 'apples'):

^(?!.*?\/(?:facebook|twitter)(?:\/|$)).*?\/(?:google|apple)(?:\/|$).*$

See an online demo


  • ^ - Start-line anchor;
  • (?! - Open negative lookahead;
    • .*?\/ - 0 (Lazy) characters upto a forward slash;
    • (?:facebook|twitter) - A nested non-capture group with the two alternations;
    • (?:\/|$) - A 2nd non-capture group to assert previous words are followed by a forward slash or end-string anchor;
    • ) - Close negative lookahead;
  • .*?\/ - 0 (Lazy) characters upto a forward slash;
  • (?:google|apple) - A non-capture group with the two alternations;
  • (?:\/|$) - Another non-capture group to assert previous words are followed by a forward slash or end-string anchor;
  • .*$ - 0 (Greedy) characters upto the end-string anchor.

As per your comments, there seem to be quite a couple of specifics (leading/trailing dot's/hyphen or else). Therefor maybe add all these to a character class or accept that a word-boundary might be good enough for you:

^(?!.*?\b(?:facebook|twitter)\b).*?\b(?:google|apple)\b.*$
  •  Tags:  
  • Related