Home > Mobile >  Is it possible to match a nested pair with regex?
Is it possible to match a nested pair with regex?

Time:02-08

Im attempting to parse some BBCode with regex, but the nested structures are giving me a headache

What I'm trying to parse is the following:

[COLOR="Red"]Red [COLOR="Green"]Green[/COLOR][/COLOR]

I've come up with the following pattern, which I need to deal with the quotation marks around the color attribute, but it only matches the first leading COLOR and the first closing COLOR. Its not matching in a proper nested arrangement

\[COLOR=(\"?)(.*?)(\"?)]([\s\S]*?)\[\/COLOR\]\

Its being done in dart, as follows, but really I believe the problem might be with my regex pattern rather then the dart implementation

  text = text.replaceAllMapped(RegExp(r'\[COLOR=(\"?)(.*?)(\"?)]([\s\S]*?)\[\/COLOR\]', caseSensitive: false, multiLine: true), (match) {
    return '<font style="color: ${match.group(2)}">${match.group(4)}</font>';
  });

CodePudding user response:

Matching braces (of any kind) are not regular. Matching braces is known as "Post's correspondence problem", and it's known to be a problem which is context free (can be solved by a stack machine or specified by a context free grammar), but not regular (can be solved by a finite state machine or specified by a regular expression). While the commonly implemented "regular expressions" can do some non-regular things (due to backreferences), this is not one of those things.

In general, I'd recommend using a RegExp to tokenize the input, then build the stack based machine yourself on top.

Here, because it's simple enough, I'd just match the start and end markers and replace them individually, and not try to match the text between.

var re = RegExp(r'\[COLOR="(\w )"\]|\[/COLOR\]');
text = text.replaceAllMapped(re, (m) {
  var color = m[1]; // The color of a start tag, null if not start tag.
  return color == null ? "</span>" : ​"<span style='color:$color'>";
});

If you want to check that the tags are balanced, we're back to having a stack (in this case so simple it's just a counter):

var re = RegExp(r'\[COLOR="(\w )"\]|\[/COLOR\]');
var nesting = 0;
text = text.replaceAllMapped(re, (m) {
  var color = m[1];
  if (color == null) {
    if (nesting == 0) {
      throw ArgumentError.value(text, "text", "Bad nesting");
    }
    nesting--; // Decrement on close tag.
    return "</span>";
  }
  nesting  ; // Increment on open-tag.
  return ​"<span style='color:$color'>";
});
if (nesting != 0) {
  throw ArgumentError.value(text, "text", "Bad nesting");
}
  •  Tags:  
  • Related