In the following code:
$a = ['aaa_bbb', 'aaa_bbb_ccc_AAA', 'aaa_bbb_ccc_BBB', 'aaa_bbb_ddd'];
foreach ($a as $s) {
if (preg_match('/aaa_bbb(?:_.*)?_.*$/', $s, $g)) {
print_r($g[0].PHP_EOL);
}
}
echo '====='.PHP_EOL;
foreach ($a as $s) {
if (preg_match('/aaa_bbb(?:_.*)?_.*$(?<!_ccc_BBB)/', $s, $g)) {
print_r($g[0].PHP_EOL);
}
}
which outputs
aaa_bbb_ccc_AAA
aaa_bbb_ccc_BBB
aaa_bbb_ddd
=====
aaa_bbb_ccc_AAA
aaa_bbb_ddd
WHY does the second loop successfully ignores the 'aaa_bbb_ccc_BBB' string?
How does the lookbehind (?<!_ccc_BBB) knows that it should start AFTER aaa_bbb? It was my understanding that lookbehinds "specify a group that can not match before the main expression" (from RegExr Reference section). I thought the main expression was the entire RegEx.
CodePudding user response:
The lookbehind, as its name suggests, looks behind the current position in the string. In your case, it does not look "after aaa_bbb", instead it looks before the end of the line (the $ marker).
The regular expression can be seen as follows:
aaa_bbb # literally, this
(?:_.*)? # an optional and noncaptured underscore stuff
_.* # a compulsory underscore stuff
$ # end of line
(?<!_ccc_BBB) # last 8 chars must not be these.
