How it works currently
I am able to capture the values between the brackets:
[[two b][three c]]
The result is
two b
three c
The RegEx for that
\[\[(. ?)\]\[(. ?)\]\]
When I use this string
[[one a]]
Nothing is captured and that is how I expect it. Fine.
The problem
I combine the strings
[[one a]] and [[two b][three c]]
This is captured
one a]] and [[two b
three c
What I understand
In my understanding there a possible approaches could be to negate the ]] string. But I don't know how to do this. And I am not sure if this is the right approach.
CodePudding user response:
The . char matches any char other than line break chars, and the fact it is quantified with a lazy quantifier does not restrict it from matching basically any char (the matches are searched for from left to right, thus, [[ matched is the leftmost [[ and the next ][ is matched regardless if there was a [ or ] in between.
So, one approach is to exclude any square brackets between [[ and ][ using a negated character class [^\]\[]:
\[\[([^\]\[] )\]\[([^\]\[] )\]\]
See the regex demo. Here, [^\]\[] that replaced . ? match one or more chars other than [ and ].
Another approach is the one you mention, namely, match any chars that do not start [[ (and probably ]], too) before ][:
\[\[((?:(?!\[\[).)*?)\]\[(.*?)\]\]
\[\[((?:(?!\[\[|\][\]\[]).)*)\]\[(.*?)\]\]
See this regex demo.
The (?:(?!\[\[).)*? part matches any char (.), zero or more but as few as possible occurrences (*?), that does not start a [[ char sequence ((?!\[\[)).
The (?:(?!\[\[|\][\]\[]).)* part matches any char (.), zero or more but as many as possible occurrences (*), that does not start a [[, [[ or ][ char sequences ((?!\[\[|\][\]\[])).
Depending on the regex flavor, you can get rid of some backslashes in this regex.
