I have regex which works searching html <h> family tags but does not work if any other tag inside <h>. See the examples below.
<h([\d]).*>\s*[\d]*\s?[.]?\s?([^<] )<\/h([\d])>
It works
<h2 style="margin-top:1em;">What is Python?</h2>
It does not work
<h2 style="margin-top:1em;">Python Jobs<span >New!</span></h2>
How to capture this Python Jobs<span >New!</span> as second group? Need 3 capturing groups - 2 of h2, Python Jobs<span >New!</span> as second group and 2 of closing h2.
CodePudding user response:
([^<] ) means to match a sequence of anything except < before </h2>. Since the nested tags contain < characters, this won't match them.
Use . ? to match the contents of the tag. The ? makes it non-greedy, so it will stop when it gets to the first </h#>.
You can also use a back-reference in the </h#> part of the match, so the closing tag is forced to match the opening tag.
<h(\d).*?>\s*\d*\s?\.?\s?(. ?)<\/h(\1)>
BTW, there's no need to put \d inside [].
