Home > Net >  Confused about lookbehind behaviour
Confused about lookbehind behaviour

Time:02-01

In the following code:

$a = ['aaa_bbb', 'aaa_bbb_ccc_AAA', 'aaa_bbb_ccc_BBB', 'aaa_bbb_ddd'];

foreach ($a as $s) {
    if (preg_match('/aaa_bbb(?:_.*)?_.*$/', $s, $g)) {
        print_r($g[0].PHP_EOL);
    }
}
echo '====='.PHP_EOL;
foreach ($a as $s) {
    if (preg_match('/aaa_bbb(?:_.*)?_.*$(?<!_ccc_BBB)/', $s, $g)) {
        print_r($g[0].PHP_EOL);
    }
}

which outputs

aaa_bbb_ccc_AAA
aaa_bbb_ccc_BBB
aaa_bbb_ddd
=====
aaa_bbb_ccc_AAA
aaa_bbb_ddd

WHY does the second loop successfully ignores the 'aaa_bbb_ccc_BBB' string?

How does the lookbehind (?<!_ccc_BBB) knows that it should start AFTER aaa_bbb? It was my understanding that lookbehinds "specify a group that can not match before the main expression" (from RegExr Reference section). I thought the main expression was the entire RegEx.

CodePudding user response:

The lookbehind, as its name suggests, looks behind the current position in the string. In your case, it does not look "after aaa_bbb", instead it looks before the end of the line (the $ marker).

The regular expression can be seen as follows:

aaa_bbb                            # literally, this
       (?:_.*)?                    # an optional and noncaptured underscore stuff
               _.*                 # a compulsory underscore stuff
                  $                # end of line
                   (?<!_ccc_BBB)   # last 8 chars must not be these.
  •  Tags:  
  • Related