Let's say I have the following log file (with line endings):
[xxx] test test[xxx]foobar
more data
[xxx] more data
[xxx] other data []:foo bar
more data here
[xxx] 1234
I would like to retrieve all parts starting with [xxx] up until the next occurrence of [xxx], so the result would become (\n indicating the newline here):
$result = [
'[xxx] test test[xxx]foobar \n more data',
'[xxx] more data',
'[xxx] other data []:foo bar \n more data here',
'[xxx] 1234'
]
I came up with the regex /(\[xxx\] .*)/g but it fails to match the cases where there are multiple lines per log entry. I've tried variations like /(\[xxx\] [\s.]*)/g but to no avail.
I feel like I'm missing something obvious here. What modifiers or other syntax should I use?
CodePudding user response:
You can use either of
preg_match_all('~\[xxx].*(?:\R(?!\[xxx]).*)*~', $text, $matches)
preg_match_all('~\[xxx].*?(?=\[xxx]|\z)~s', $text, $matches)
Or - if the left hand [xxx] always appears at the start of a line
preg_match_all('~^\[xxx].*(?:\R(?!\[xxx]).*)*~m', $text, $matches)
preg_match_all('~^\[xxx].*?(?=^\[xxx]|\z)~ms', $text, $matches)
The first solution (demo) is preferable because it is more efficient (see the second regex demo).
Details:
^- start of a line\[xxx]- a[xxx]string.*- the rest of the line(?:\R(?!\[xxx]).*)*- zero or more sequences of\R(?!\[xxx])- a line break sequence not immediately followed with[xxx].*- the rest of the line.
The ^\[xxx].*?(?=^\[xxx]|\z) regex matches [xxx] at the start of a line, then any zero or more chars as few as possible, and then either a position immediately followed with [xxx] at the start of a line or end of string.
CodePudding user response:
An alternate php solution using preg_split preg_replace with a simple regex:
$data = '[xxx] test test[xxx]foobar
more data
[xxx] more data
[xxx] other data []:foo bar
more data here
[xxx] 1234';
foreach(preg_split('/^(?=\[xxx] )/m', $data) as $el) {
echo preg_replace('/\n(?!$)/', '\\n', $el);
}
Output:
[xxx] test test[xxx]foobar\nmore data
[xxx] more data
[xxx] other data []:foo bar\nmore data here
[xxx] 1234
Breakup:
/^(?=\[xxx] )/m: Using this regex inpreg_splitso that we split input text every time[xxx]appears on line start/\n(?!$)/: Using this regex to replace\nfrom each element of split array with\\n
