Home > Back-end >  Grouping of regex with same name
Grouping of regex with same name

Time:01-05

I am trying to write a regex to get the ingredients name, quantity, unit from the sting. The string can be any pattern like "pohe 2 kg OR 2 Kg pohe OR 2Kg Pohe". I have tried with below code -

<?PHP
    $units = array("tbsp", "ml", "g", "grams", "kg", "few drops"); // add whatever other units are allowed
  
    
    //mixed pattern
    $pattern = '/(?J)(((?<i>^[a-zA-Z\s] )(?<q>\d*\s*)(?<u>' . join("|", array_map("preg_quote", $units)) . '))|(?<q>^\d*\s*)(?<u>' . join("|", array_map("preg_quote", $units)) . ')(?<i>[a-zA-Z\s] ))/';

    
    $ingredients = '2kg pohe';
    
    preg_match_all($pattern, $ingredients, $m);
    print_r($m);
    $quantities = $m['q'];
    $units = array_map('trim', $m['u']);
    $ingrd = array_map('trim', $m['i']);
    print_r($quantities);
    print_r($units);
    print_r($ingrd);
?>

The above code works for the string "2kg pohe", but not for the "pohe 2kg".

If anyone having idea what I am missing, please help me in this.

CodePudding user response:

For pohe 2kg duplicate named groups are empty, as the documentation of preg_match_all states that for the flag PREG_PATTERN_ORDER (which is the default)

If the pattern contains duplicate named subpatterns, only the rightmost subpattern is stored in $matches[NAME].

Int he pattern that you generate, there is a match in the second part (after the alternation) for 2kg pohe but for the pohe 2kg there is only a match in the first part so for the second part there are no values stored.

What you might do, is use the PREG_SET_ORDER flag instead, which gives:

$ingredients = '2kg pohe';
preg_match_all($pattern, $ingredients, $m, PREG_SET_ORDER);
print_r($m[0]);

Output

Array
(
    [0] => 2kg pohe
    [i] =>  pohe
    [1] => 
    [q] => 2
    [2] => 
    [u] => kg
    [3] => 
    [4] => 2
    [5] => kg
    [6] =>  pohe
)

And

$ingredients = 'pohe 2kg';
preg_match_all($pattern, $ingredients, $m, PREG_SET_ORDER);
print_r($m[0]);

Output

Array
(
    [0] => pohe 2kg
    [i] => pohe 
    [1] => pohe 
    [q] => 2
    [2] => 2
    [u] => kg
    [3] => kg
)

Then you can get the named subgroups for both strings like $m[0]['i'] etc..

Note that in the example there is 2Kg and you can make the pattern case insensitive to match.

  •  Tags:  
  • Related