Home > database >  Regex matching a link excluding a word from a string
Regex matching a link excluding a word from a string

Time:01-12

I'm trying to do a simple regex detection system of a multiline string containing both text and URLs, let's suppose the string is:

this is a list of youtube videos
https://www.youtube.com/watch...
https://www.youfake.com/watch...
https://www.youlube.co.uk/watch...

I want regex to get exactly the beginning of the website and also the /watch part, but without matching youtube in any case, in the above string, it should pick you from youfake but only if /watch is present, same as youlube (with the domain) and the /watch part, so it should match line 3 and line 4, without line 2 which is youtube.

The regex I currently have is (you).*(\.com|\.co.uk).*(\/watch) however it matches all three links, how can I exclude the exact word 'youtube' while also using 'you' as part of the regex?

CodePudding user response:

You can use

(you)(?!tube\b).*(\.co(?:m|\.uk)).*(/watch)

See the regex demo.

You can see I added a negative lookahead after (you) to make sure there is no match if you found is a part of the youtube word.

Details:

  • (you) - Group 1: you
  • (?!tube\b) - a negative lookahead that matches a location that is not immediately followed with tube and a word boundary
  • .* - any zero or more chars other than line break chars, as many as possible
  • (\.co(?:m|\.uk)) - Group 2: .co and then either m or .uk
  • .* - any zero or more chars other than line break chars, as many as possible
  • (/watch) - Group 3: /watch.
  •  Tags:  
  • Related