I`m trying to extract the src URL/path without the quotes, only in the case it is an image:
- src="/path/image.png" // should capture => /path/image.png
- src="/path/image.bmp" // should capture => /path/image.bmp
- src="/path/image.jpg" // should capture => /path/image.jpg
- src="https://www.site1.com" // should NOT capture
So far I have /src="(.*)"/g, but that obviously captures both, I have been looking at look behind and look ahead but just can`t put it together.
CodePudding user response:
You can use a capture group, and you should prevent crossing the " using a negated character class.
If you want to match either href or src
\b(?:href|src)="([^\s"]*\.(?:png|jpg|bmp))"
Explanation
\bA word boundary to prevent a partial word match(?:href|src)="match eitherhref=orsrc=(Capture group 1[^\s"]*Match optional chars other than a whitespace char or"\.(?:png|jpg|bmp)Match one of.png.jpg.bmp
)Close group 1"Match literally
const regex = /\b(?:href|src)="([^\s"]*\.(?:png|jpg|bmp))"/;
[
'src="/path/image.png" test "',
'src="/path/image.bmp"',
'src="/path/image.jpg"',
'src="https://www.site1.com"',
'href="image.png"'
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(m[1]);
}
})
CodePudding user response:
Try /src="(.*[jpg|bmp|png])"/g
You'll need to enter in the list of extensions you consider valid images
CodePudding user response:
If you want it to be a bit more fool proof you can use look behinds and look aheads. Expand the extension list png|bmp|jpg to test for more extensions.
/(?<=src=").*(png|bmp|jpg)(?=")/g
CodePudding user response:
Try this src="(.*image.*)"
