Home > Software engineering >  Regex to match SUBSTR function with trailing IN clause
Regex to match SUBSTR function with trailing IN clause

Time:01-19

I'm using PHP preg_match_all() to find SUBSTR() function calls in WHERE clauses in SQL queries. The match shall return both the function call including parameters and also the comparison operator and compared value. Example:

SUBSTR(d.chassinummer, 7, 2) = 'CD'

In query strings the result from SUBSTR() is compared either using equal operator or IN() operator. There are often multiple and mixed combinations of the two in any query. An example query can look like this

SELECT * 
FROM tbl 
WHERE col_a = 'abc' 
AND SUBSTR(col_b, 7, 2) = 'cd' 
AND col_c LIKE 'ACD0%' 
AND SUBSTR(col_d, 10, 1) IN ('A','B','C') 
AND SUBSTR(col_e, 5, 3) = 'cde' 
AND SUBSTR(col_f, 2, 1) IN ('C','D','E')

Note: There are no line breaks in the query strings that I match on.

I managed to write a pattern that almost catches the cases where equals operator is used. It looks like this

preg_match_all('/substr\(.*?\) *?= *?\'.*?\'/i', $query, $matches);

However in the given example query my pattern fails by terminating a match where IN() is present on a later match. After fiddling with this for a while I surrender and turn to the experts for some support. Can anyone help me understand how to extract the following matches from the given example query

  1. SUBSTR(col_b, 7, 2) = 'cd'
  2. SUBSTR(col_d, 10, 1) IN ('A','B','C')
  3. SUBSTR(col_e, 5, 3) = 'cde'
  4. SUBSTR(col_f, 2, 1) IN ('C','D','E')

If it is easier to divide it into two separate patterns and use two preg_match_all() calls, that is fine by me, however if it can be done in one call that would feel cleaner.

Many thanks in advance.

CodePudding user response:

For the example data (not taking nested queries into account) you might use:

substr\([^()]*\)\h*(?:=\h*'[^']*'|IN\h*\([^()]*\))

The pattern matches:

  • substr\([^()]*\)
  • \h* Match optional spaces
  • (?: Non capture group
    • =\h*'[^']*' match = then optional spaces and from ' till ' not matching ' in between
    • | Or
    • IN\h*\([^()]*\) Match IN then optional spaces and from ( till ) not matching ( and ) in between
  • ) Close non capture group

Regex demo | PHP demo

Example

$re = '`substr\([^()]*\)\h*(?:=\h*\'[^\']*\'|IN\h*\([^()]*\))`mi';
$str = <<<STR
SELECT * 
FROM tbl 
WHERE col_a = 'abc' 
AND SUBSTR(col_b, 7, 2) = 'cd' 
AND col_c LIKE 'ACD0%' 
AND SUBSTR(col_d, 10, 1) IN ('A','B','C') 
AND SUBSTR(col_e, 5, 3) = 'cde' 
AND SUBSTR(col_f, 2, 1) IN ('C','D','E')
STR;

preg_match_all($re, $str, $matches, 0);

print_r($matches[0]);

Output

Array
(
    [0] => SUBSTR(col_b, 7, 2) = 'cd'
    [1] => SUBSTR(col_d, 10, 1) IN ('A','B','C')
    [2] => SUBSTR(col_e, 5, 3) = 'cde'
    [3] => SUBSTR(col_f, 2, 1) IN ('C','D','E')
)
  •  Tags:  
  • Related