I'm using PHP preg_match_all() to find SUBSTR() function calls in WHERE clauses in SQL queries. The match shall return both the function call including parameters and also the comparison operator and compared value. Example:
SUBSTR(d.chassinummer, 7, 2) = 'CD'
In query strings the result from SUBSTR() is compared either using equal operator or IN() operator. There are often multiple and mixed combinations of the two in any query. An example query can look like this
SELECT *
FROM tbl
WHERE col_a = 'abc'
AND SUBSTR(col_b, 7, 2) = 'cd'
AND col_c LIKE 'ACD0%'
AND SUBSTR(col_d, 10, 1) IN ('A','B','C')
AND SUBSTR(col_e, 5, 3) = 'cde'
AND SUBSTR(col_f, 2, 1) IN ('C','D','E')
Note: There are no line breaks in the query strings that I match on.
I managed to write a pattern that almost catches the cases where equals operator is used. It looks like this
preg_match_all('/substr\(.*?\) *?= *?\'.*?\'/i', $query, $matches);
However in the given example query my pattern fails by terminating a match where IN() is present on a later match. After fiddling with this for a while I surrender and turn to the experts for some support. Can anyone help me understand how to extract the following matches from the given example query
SUBSTR(col_b, 7, 2) = 'cd'SUBSTR(col_d, 10, 1) IN ('A','B','C')SUBSTR(col_e, 5, 3) = 'cde'SUBSTR(col_f, 2, 1) IN ('C','D','E')
If it is easier to divide it into two separate patterns and use two preg_match_all() calls, that is fine by me, however if it can be done in one call that would feel cleaner.
Many thanks in advance.
CodePudding user response:
For the example data (not taking nested queries into account) you might use:
substr\([^()]*\)\h*(?:=\h*'[^']*'|IN\h*\([^()]*\))
The pattern matches:
substr\([^()]*\)\h*Match optional spaces(?:Non capture group=\h*'[^']*'match=then optional spaces and from'till'not matching'in between|OrIN\h*\([^()]*\)MatchINthen optional spaces and from(till)not matching(and)in between
)Close non capture group
Example
$re = '`substr\([^()]*\)\h*(?:=\h*\'[^\']*\'|IN\h*\([^()]*\))`mi';
$str = <<<STR
SELECT *
FROM tbl
WHERE col_a = 'abc'
AND SUBSTR(col_b, 7, 2) = 'cd'
AND col_c LIKE 'ACD0%'
AND SUBSTR(col_d, 10, 1) IN ('A','B','C')
AND SUBSTR(col_e, 5, 3) = 'cde'
AND SUBSTR(col_f, 2, 1) IN ('C','D','E')
STR;
preg_match_all($re, $str, $matches, 0);
print_r($matches[0]);
Output
Array
(
[0] => SUBSTR(col_b, 7, 2) = 'cd'
[1] => SUBSTR(col_d, 10, 1) IN ('A','B','C')
[2] => SUBSTR(col_e, 5, 3) = 'cde'
[3] => SUBSTR(col_f, 2, 1) IN ('C','D','E')
)
