This objective in this question is to extract a URL and dimension from a srcset html attribute string. Specifically the parameters here are the following:
- Url starts with
httporhttps - Url may contain
, - Url cannot contain spaces
- Dimension contains digits followed by
xorw. Potentially doesn't even need to be followed by either of those though.
Because of this, the desired method for matching is to find the http/https and match until a space, then match digits immediately followed by a w or x, then a comma. A space following this would denote the end of the match.
This usually looks like https://url.com 650w or https://url.com 650 or https://url.com 650x. There is no strict standard here.
Here is my attempted regex with the Regex101 demo here. The problem here is that it's not grouping correctly:
(https?:\/\/(?:.*(?:\s \d [wx])(?:,\s*)?) )
Sample string to parse:
http://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 640w, http://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 750w, https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 828, https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 1080x, https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 1200w, https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 1920w, https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 2048w, https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 3840w,https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 3840w, https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw=&auto=format&fit=crop&w=100&q=60 100w, https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw=&auto=format&fit=crop&w=200&q=60 200w, https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw=&auto=format&fit=crop&w=300&q=60 300w, https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw=&auto=format&fit=crop&w=400&q=60 400w, https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw=&auto=format&fit=crop&w=500&q=60 500w, https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw=&auto=format&fit=crop&w=600&q=60 600w, https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw=&auto=format&fit=crop&w=700&q=60 700w
The outcome of this should be:
http://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 640w
http://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 750w
https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 828
https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 1080x
https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 1200w
https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 1920w
https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 2048w
https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 3840w
https://media.endclothing.com/media/f_auto,w_600,h_600/prodmedia/media/catalog/product/0/4/04-12-2021_LL_212ATDT-CP02S-OD_1_1.jpg 3840w
https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw=&auto=format&fit=crop&w=100&q=60 100w
https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw=&auto=format&fit=crop&w=200&q=60 200w
https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw=&auto=format&fit=crop&w=300&q=60 300w
https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw=&auto=format&fit=crop&w=400&q=60 400w
https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw=&auto=format&fit=crop&w=500&q=60 500w
https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw=&auto=format&fit=crop&w=600&q=60 600w
https://images.unsplash.com/photo-1599420186946-7b6fb4e297f0?ixlib=rb-1.2.1&ixid=MnwxMjA3fDF8MHxlZGl0b3JpYWwtZmVlZHwxfHx8ZW58MHx8fHw=&auto=format&fit=crop&w=700&q=60 700w
CodePudding user response:
For the 4 points in the question, and to get the outcome of the example string, you can use:
https?:\/\/\S* \d [xw]?(?=,|$)
The pattern matches:
https?:\/\/Match the protocol for http and https\S*Match optional non whitespace chars (can contain a comma) and then a space\d [xw]?Match 1 digits and optionalxorw(?=,|$)Positive lookahead, assert either a,or the end of the string to the right
