Home > Software engineering >  regex scan files for specific contents
regex scan files for specific contents

Time:01-23

I'm trying to read all my project files and it's contents to detect some strings.

I have a working piece of code now, but there are some missing parts.

The goal is to scan all of my files and add some items into the database if they occure in a scanned file.

For example;

I have some code like this:

@can('event-tools::menu.view') then it should return event-tools::menu.view as "found string".

I also have something like $this->middleware('can:access registration check');, then it should also detect the access registration check

I currently work with a regex to scan the file contents like this:

[^\w](@can|hasPermissionTo|hasDirectPermission)\(\s*(?P<quote>['"])(?P<string>(?:\\k{quote}|(?!\k{quote}).)*)\k{quote}\s*[\),]

Anyone who might be able to help with this? Or if I should use another approach?

I check for matches using the following:

preg_match_all("/$stringPattern/siU", $fileContents, $matches)

CodePudding user response:

You could make use of a branch reset group and match the 2 different formats using an alternation with the same named groups.

(?|(?:@can|hasPermissionTo|hasDirectPermission)\(\s*(?P<quote>['"])(?P<string>.*?)\1\)|\((?P<quote>['"])can:(?P<string>.*?)\1\)|\(\[[^][]*(?P<quote>['"])can:(?P<string>.*?)\1[^][]*]\))

The pattern in parts, using 2 alterations | for 3 different variations:

  • (?| Branch reset group
    • (?:@can|hasPermissionTo|hasDirectPermission) Match 1 of the alternatives
    • \(\s* Match ( and optional whitespace chars
    • (?P<quote>['"]) Match either ' or " in group quote
    • (?P<string>.*?)\1 Group string Match as least as possible chars till the same quote that was captured in group quote
    • \) Match )
    • | Or
    • \( Match (
    • (?P<quote>['"]) - Same as before
    • can: Match literally (Or use an alternation again for multiple words)
    • (?P<string>.*?)\1 - Same as before
    • \) Match )
    • | Or
    • \(\[ Match ([
    • [^][]* Match any char except [ and ]
    • (?P<quote>['"]) Same as before
    • can: Match literally
    • (?P<string>.*?)\1 Same as before
    • [^][]*]\) Match any char except [ ] using a negated characer class, then match ])
  • ) Close branch reset group

See a regex demo.

$re = '/(?|(?:@can|hasPermissionTo|hasDirectPermission)\(\s*(?P<quote>[\'"])(?P<string>.*?)\1\)|\((?P<quote>[\'"])can:(?P<string>.*?)\1\)|\(\[[^][]*(?P<quote>[\'"])can:(?P<string>.*?)\1[^][]*]\))/';
$str = <<<'STR'
@can('event-tools::menu.view')
$this->middleware('can:access registration check');
Route::prefix('administration')->middleware(['auth', 'verified', 'can:access admin area'])->group(static function () {
STR;

$result = preg_match_all($re, $str, $matches);
print_r($matches["string"]);

Output

Array
(
    [0] => event-tools::menu.view
    [1] => access registration check
    [2] => access admin area
)
  •  Tags:  
  • Related