I am trying to do a regex formula to match when two words are present google and apple but exclude facebook and twitter but I am having loads of problems
I tried this
^(?!.*(facebook|twitter)).*
But I also want to exclude anything is not google or apple
expected output:
'https://facebook/google' --> Exclude
'https://twitter/google' --> Exclude
'https://google/foo' --> Include
'https://apple/' --> Include
'https://twitter/apple' --> Exclude
'https://nomatch' --> Exclude
CodePudding user response:
I'm thinking maybe it's good to check for exact words to avoid possible false negatives (e.g: 'apple' in 'apples'):
^(?!.*?\/(?:facebook|twitter)(?:\/|$)).*?\/(?:google|apple)(?:\/|$).*$
See an online demo
^- Start-line anchor;(?!- Open negative lookahead;.*?\/- 0 (Lazy) characters upto a forward slash;(?:facebook|twitter)- A nested non-capture group with the two alternations;(?:\/|$)- A 2nd non-capture group to assert previous words are followed by a forward slash or end-string anchor;)- Close negative lookahead;
.*?\/- 0 (Lazy) characters upto a forward slash;(?:google|apple)- A non-capture group with the two alternations;(?:\/|$)- Another non-capture group to assert previous words are followed by a forward slash or end-string anchor;.*$- 0 (Greedy) characters upto the end-string anchor.
As per your comments, there seem to be quite a couple of specifics (leading/trailing dot's/hyphen or else). Therefor maybe add all these to a character class or accept that a word-boundary might be good enough for you:
^(?!.*?\b(?:facebook|twitter)\b).*?\b(?:google|apple)\b.*$
