Given a url list, how can I divide it to 3 sum lists?
one for YT videos, second for YT channels, third for all the rest?
const paragraph1 = 'www.youtube.com/watch?v=NsjeEt1ZpqQ';
const regex1 = /www.youtube.com/(\c*)(watch?v=)?<videoId>[A-Z,0-9])/gi;
const paragraph2 = 'https://www.youtube.com/channel/UCKqFqiCe1dCUxRe0_YNZ6gg';
const regex2 = /www.youtube.com/channel/?<channelId>[A-Z,_,0-9])/gi;
const found = paragraph1.match(regex1);
console.log(found);
// expected output: Array ["T", "I"]
const found = paragraph2.match(regex2);
console.log(found);
Tried to sandbox on this site.
CodePudding user response:
Since you are planning to split some URL string list into three different parts, you can use three different patterns:
www\.youtube\.com\/watch\?v=(?<videoId>\S )
www\.youtube\.com\/channel\/(?<videoId>\S )
www\.youtube\.com(?!\/(?:channel\/|watch\?v=))\S*
See regex #1, regex #2 and regex #3 demos. Note you need an ECMAScript 2018 compliant JavaScript environment for the named capturing groups to work. Also, see the dots are escaped everywhere they denote literal dots.
The patterns mean
www\.youtube\.com\/watch\?v=- a literalwww.youtube.com/watch?v=string(?<videoId>\S )- Group "videoId": one or more non-whitespace charswww\.youtube\.com\/channel\/(?<videoId>\S )- a literalwww.youtube.com/channel/string and then Group "videoId" capturing one or more non-whitespace charswww\.youtube\.com(?!\/(?:channel\/|watch\?v=))\S*-www.youtube.comstring and then a negative lookahead that fails the match if, immediately to the right, there is a/char, thenchannel/orwatch?v=, and then zero or more non-whitespace chars are consumed.
If you plan to use the patterns agains some mark-up text, make sure you subtract the mark-up chars from the \S pattern, that is, change it into a negated character class with a reverse shorthand, [^\s], and add the chars after \s. Say, if the links are inside double quotes, put " there, [^\s"].
