Home > OS >  More elegant (shorter) solution for this regex pattern
More elegant (shorter) solution for this regex pattern

Time:02-07

I have spent three days banging my head on how to find a single solution to match anything between either single or double quotes with escaped single or doublequotes within actual source string and to replace matching text without touching targeted quotes alone .. and I think that I have succeeded. Multi-line or single-line - it works. That is, this regex can be used to alter/change/sanitize 'text' or "text" or strings in other words, in any source code *(eg: file_get_contents ('some_class.php')) and to leave everything else untouched, assuming that code comments are already removed before such action.

Here is regex wrapped in singlequotes ..

'@"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\'@msu'

.. and here is regex wrapped within doublequotes.

"@\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"|'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'@msu"

It is perfeclty matching with source code like this ...

// Very nasty php array 

$Damn = [

  'a' => "' lorem ipsum '",

  'b' => '"\" ipsu\'m lorem  ',

  'c' => " \' YabadabaDooya \" ",

  'd\"' => ' 

     f"

     o\'"o  

                 \'bar" ',

  'e' => "'",

  "f" => '"'

];

Since this is working as I expect, and I am actually not a PCRE guru (don't ask how much 'pain' I've had in the past three days D: until I came up to this solution), if there's anyone who knows how, and is willing to help by shrinking the above regex into more elegant/shorter solution, that would be superb. I assume that | (or) in the middle of the pattern can be placed onto beginning, just once .. and I tried God only knows what .. to accomplish that, but no luck.

So, the general question is - how would shorter variant of the above pattern look alike ?

CodePudding user response:

Using a lazy search for anything possible escaped between the quotes.

Pure regex:

(["'])(?:\\.|[\s\S])*?\1

PHP regex:

$re = '/(["\'])(?:\\\\.|[\s\S])*?\1/';

Test here

CodePudding user response:

I would like to thank mr. Wahyu Kristianto who proposed much more elegant and smarter solution than mine.

Here is his regex.

(["'])((?:\\\1|(?:(?!\1)).)*)(\1)

And it is the - perfect - one.

Exactly the thing that I was looking for. With additional regex options, it can be quite optimized and insanely performant. :)

Not only that, by just adding a single backtick within the first character group, the regex will match singlequotes, doublequotes and backticks as well, and that change is required on only one place.

I think it can't be more decent and cleaner than this. Maybe I am wrong. But I doubt that.

Mr. Wahyu, you're - AWESOME. :)))


edit:

Aaaaand .. the devil never sleeps I guess .. I have just encountered a problem.

Seems like this regex actually breaks with huge files .. :/


edit numero 2:

Aaaannndd seems like I did a mistake last night (was really tired and was late) and this regex actually does works like a charm! :))

ps: Please don't blame me for these kind of edits, there are no hidden intents, just me being a bit clumbsy. Wahyu, Yes. You are the boss.

  •  Tags:  
  • Related