We have urls in the following URL formats, I want to get only digit values between the strings I specified, I tried a pattern like this (?<=\/sub.example.com\/)(.*)(?=\?[Uu]rl|$) but it does not give the result I want
https://sub.example.com/79084/t/64931?Url=https://www.test.com/path/otherpath/
https://sub.example.com/79084/t/64931
Expected results:
[ 79084, 64931 ]
I need to exclude /t/
CodePudding user response:
Given the sample URLs in the question it should be sufficient to simply match digits preceded by a slash:
(?<=/)\d
Demo: https://regexr.com/6tia6
CodePudding user response:
Using dynamic length lookbehind feature in Javascript, you can use this regex:
(?<=\/sub\.example\.com\/(?:[^\/]*\/)*)\d (?=(?:\/[^\/]*)*(?:\?[Uu]rl|$))
Note that it will match all the digits after domain name e.g. https://sub.example.com/79084/t/64931/1234/6789 will have 4 matches for all the numbers.
RegEx Breakup:
(?<=\/sub\.example\.com\/(?:[^\/]*\/)*): Lookbehind to assert presence ofsub.example.com/followed by 0 or more repeats of path components separated with/\d: Match 1 digits(?=(?:\/[^\/]*)*(?:\?[Uu]rl|$)): Must be followed by 0 or more repeats of path components separated with/and that must be followed by?Urlor line end.
CodePudding user response:
If all your Urls have this same format with digits /anything/ digits, then you can change your .* to be more specific:
(?<=\/sub.example.com\/)(\d )\/(.*)\/(\d )(?=\?[Uu]rl|$)
So changing it to (\d )\/(.*)\/(\d ) allows you to get each set of digits as a matched group.
