How can I capture the mostnested elements of a "random" text?
alpha
<div>
alpha<div>
beta<div>
x < y divided by 4
</div>
</div>
</div>
<div>
<span style="font-size: 8pt;" disabled title="data">
<span>
infinite
</span>
<?= $record->id ?>
</span>
<div> equal </div>
</div>
<div> sum </div>
When x y and y > 0
<div style="font-size: 8pt;" >Summary</div>
Equation id <?= $equation->id ?>
In this exampled they're
- x < y divided by 4
- infinite
- equal
- sum
- summary
The follows regex can solve this question:
/(<(\w )[^>]*>)([a-z \s<0-9]*(?!<\2))(<\/\2>)/gs
but x < y divided by 4 it's just a sample, in reality could be a LaTex expression or some javascript snippet, so I need a generic solution.
simple negate pattern
Works well but did not capture x < y divided by 4
/(<(\w )[^>]*>)([^<]*)(<\/\2>)/gs
negative lookahead
Doesn't work :
/(<(\w )[^>]*>)(.*(?!<\2))(<\/\2>)/gs
CodePudding user response:
Here is a regex you may want to use:
.*?<(\w )[^>]*>((?:(?!<\1>).)*?)<\/\1>|.*
Click on it for explanation and also to see how to use it.
