I'm trying to do a simple regex detection system of a multiline string containing both text and URLs, let's suppose the string is:
this is a list of youtube videos
https://www.youtube.com/watch...
https://www.youfake.com/watch...
https://www.youlube.co.uk/watch...
I want regex to get exactly the beginning of the website and also the /watch part, but without matching youtube in any case, in the above string, it should pick you from youfake but only if /watch is present, same as youlube (with the domain) and the /watch part, so it should match line 3 and line 4, without line 2 which is youtube.
The regex I currently have is (you).*(\.com|\.co.uk).*(\/watch) however it matches all three links, how can I exclude the exact word 'youtube' while also using 'you' as part of the regex?
CodePudding user response:
You can use
(you)(?!tube\b).*(\.co(?:m|\.uk)).*(/watch)
See the regex demo.
You can see I added a negative lookahead after (you) to make sure there is no match if you found is a part of the youtube word.
Details:
(you)- Group 1:you(?!tube\b)- a negative lookahead that matches a location that is not immediately followed withtubeand a word boundary.*- any zero or more chars other than line break chars, as many as possible(\.co(?:m|\.uk))- Group 2:.coand then eithermor.uk.*- any zero or more chars other than line break chars, as many as possible(/watch)- Group 3:/watch.
