I am trying to parse some HTML of a directory listing page using c#. That page has many file urls like "0220109_120548.046.jpg" but has also others like "0220109_120548.046-445x265.jpg". They are the same picture but one has its dimensions in the name.
I need a regex to match only the urls of those files without the dimensions.
I tried this one : href="^"*.(gif|jpg|png)"
but its not working.
the regex101 url: https://regex101.com/r/APS9NY/1
CodePudding user response:
Here is one way to do so:
href=\"[^\"]*?(?<!\d{2,4}x\d{2,4})\.(gif|jpg|png)\"
See here for the online demo.
href=\": Matcheshref="[^\"]*?: Any character that isn't", between zero and unlimited times, as few as possible.(?<!): Negative lookbehind.\d{2,4}: Matches between 2 and 4 digits.x: Matchesx.
\.: Matches..(gif|jpg|png): Matches eithergif,jpgorpng.\": Matches".
