Home > database >  Clean up a comma-separated list by regex
Clean up a comma-separated list by regex

Time:01-13

I want to clean up a tag list separated by comma to remove empty tags and extra spaces. I came up with

$str='first , second ,, third, ,fourth   suffix';
echo preg_replace('#[,]{2,}#',',',preg_replace('#\s*, \s*#',',',preg_replace('#\s #s',' ',$str)));

which works well so far, but is it possible to do it in one replacement?

CodePudding user response:

You can use

preg_replace('~\s*(?:(,)\s*) |(\s) ~', '$1$2', $str)

Merging the two alternatives into one results in

preg_replace('~\s*(?:([,\s])\s*) ~', '$1', $str)

See the regex demo and the PHP demo. Details:

  • \s*(?:(,)\s*) - zero or more whitespaces and then one or more occurrences of a comma (captured into Group 1 ($1)) and then zero or more whitespaces
  • | - or
  • (\s) - one or more whitespaces while capturing the last one into Group 2 ($2).

In the second regex, ([,\s]) captures a single comma or a whitespace character.

CodePudding user response:

You can use:

[\h*([,\h])[,\h]*

See an online demo. Or alternatively:

\h*([,\h])(?1)*

See an online demo


  • \h* - 0 (Greedy) horizontal-whitespace chars;
  • ([,\h]) - A 1st capture group to match a comma or horizontal-whitespace;
  • [,\h]* - Option 1: 0 (Greedy) comma's or horizontal-whitespace chars;
  • (?1)* - Option 2: Recurse the 1st subpattern 0 (Greedy) times.

Replace with the 1st capture group:

$str='first , second ,, third, ,fourth   suffix';
echo preg_replace('~\h*([,\h])[,\h]*~', '$1', $str);
echo preg_replace('~\h*([,\h])(?1)*~', '$1', $str);

Both print:

first,second,third,fourth suffix
  •  Tags:  
  • Related