Home > Blockchain >  Regex capture the inner elements
Regex capture the inner elements

Time:02-03

How can I capture the mostnested elements of a "random" text?

alpha
<div>
alpha<div>
beta<div>
x < y divided by 4
</div>
</div>
</div>


<div>
    <span style="font-size: 8pt;" disabled title="data">
        <span>
          infinite
        </span>
        <?= $record->id ?>
    </span>
    <div> equal </div>
</div>


<div> sum </div>

When x y and y > 0 
<div  style="font-size: 8pt;" >Summary</div> 
Equation id <?= $equation->id ?>

In this exampled they're

  • x < y divided by 4
  • infinite
  • equal
  • sum
  • summary

The follows regex can solve this question:

/(<(\w )[^>]*>)([a-z \s<0-9]*(?!<\2))(<\/\2>)/gs

but x < y divided by 4 it's just a sample, in reality could be a LaTex expression or some javascript snippet, so I need a generic solution.

simple negate pattern

Works well but did not capture x < y divided by 4

/(<(\w )[^>]*>)([^<]*)(<\/\2>)/gs

see the example

negative lookahead

Doesn't work :

/(<(\w )[^>]*>)(.*(?!<\2))(<\/\2>)/gs

see the example

CodePudding user response:

Here is a regex you may want to use:

.*?<(\w )[^>]*>((?:(?!<\1>).)*?)<\/\1>|.*

Click on it for explanation and also to see how to use it.

  •  Tags:  
  • Related