How to exclude redundant patterns among a large array of glob string-CodePudding

I have been working on this algorithm for days, and no idea how to figure out the most suitable/easy/optimized solution.

Here I have a large array of string as the followings

[
*.*.complete
*.*.read
*.*.update
*.order.cancel
accounting.*.delete
accounting.*.update
accounting.*.void
accounting.account.*
admin.user.read
admin.user.update
admin.format.delete
...
]

// the array may be in random order

all the values are in some wildcard patterns (in fact, they are the permissions for my system)

what i want to do is to remove redundant patterns, for example: admin.json_api.read is redundant due to *.*.read

can someone give me any suggestion/approach?

CodePudding user response：

General idea:

Each your pattern can be transformed into regex using:

new RegExp('^'   pattern
  .replace(/[./]/g, '\\$&') // escape chars (list isn't full)
  .replace(/\*/g, '(.*)')   // replace asterisk with '(.*)' - any char(s)
  '$')                      // match only full pattern

If one pattern match another one - you don't need both, because pattern with * include second one:

  if (pattern1.include('*') && pattern1.test(pattern2)) {
    // delete pattern2
  }

Simple realization can be found below (still need to optimize a bit).

Full code:

// Your initial array
const patterns = [
  '*.*.complete',
  '*.*.read',
  '*.*.update',
  '*.order.cancel',
  'accounting.*.delete',
  'accounting.*.update',
  'accounting.*.void',
  'accounting.account.*',
  'admin.user.read',
  'admin.user.update',
  'admin.format.delete',
]

// Build a new one with regexes
const withRegexes = patterns.map(pattern => {
  // Create a regex if pattern contain asterisk
  const regexp = pattern.includes('*') ? new RegExp('^'   pattern
    .replace(/[./]/g, '\\$&')
    .replace(/\*/g, '(.*)') 
    '$') : null;
  return { pattern, regexp }; 
});

// Array of indexes of elements where it's pattern already matched by another pattern
let duplicateIndexes = [];
for (let i = 0; i < withRegexes.length - 1; i  ) {
  for (let j = i   1; j < withRegexes.length; j  ) {
    if (withRegexes[i].regexp 
    && withRegexes[i].regexp.test(withRegexes[j].pattern)) {
      duplicateIndexes.push(j);
    }
  }
}

// Get unique indexes to delete in desc order
duplicateIndexes = [ ...new Set(duplicateIndexes) ].sort((a, b) => b - a);

// Clear up initial array
for (let index of duplicateIndexes) {
  patterns.splice(index, 1);
}

// New one 
console.log(patterns);

CodePudding user response：

The following approach takes different glob segment length's into account as well.

Thus in a first step the glob-array is reduced into one or more segment-length specific arrays of better inspectable glob-items.

Such an item features e.g. a regex specific pattern of its actual glob-value.

Within a final tasks each segment-length specific array gets sanitized separately into an array of non redundant glob-values.

The latter gets achieved by 1st sorting each array descending by each item's glob-value (which assures a sorting from the more to the less generic glob values) and 2nd by rejecting each item where its glob-value gets covered already by a more generic glob-value.

And the base of such a detection is the glob-value specific regex where the asterisk wild card translates into a regex pattern with the same meaning ... thus any glob value of '*.' equals a regex of /[^.] \./ and any terminating '.*' equals a regex of /\.[^.] /.

Since the sanitizing task is done via flatMap, the end result is a flat array again ...

function createGlobInspectionItem(glob) {
  const segments = glob.split('.');
  return {
    value: glob,
    pattern: glob
      .replace((/\*\./g), '[^.] .')
      .replace((/\.\*$/), '.[^.] ')
      .replace((/(?<!\^)\./g), '\\.'),
    segmentCount: segments.length,
  };
}
function collectGlobInspectionItems({ index, result }, glob) {
  const globItem = createGlobInspectionItem(glob);
  const groupKey = globItem.segmentCount;

  let groupList = index[groupKey];
  if (!groupList) {

    groupList = index[groupKey] = [];
    result.push(groupList);
  }
  groupList.push(globItem);

  return { index, result };
}

function createSanitizedGlobList(globItemList) {
  const result = [];
  let globItem;

  globItemList.sort(({ value: aValue }, { value: bValue }) =>
    (aValue > bValue && -1) || (aValue < bValue && 1) || 0
  );
  while (globItem = globItemList.pop()) {

    globItemList = globItemList.filter(({ value }) =>
      !RegExp(globItem.pattern).test(value)
    );
    result.push(globItem);
  }
  return result.map(({ value }) => value);
}
const sampleData = [

  // 3 segments

  '*.*.complete',
  '*.*.read',
  '*.*.update',
  '*.order.cancel',
  'accounting.*.delete',
  'accounting.*.update',
  'accounting.*.void',
  'accounting.account.user',
  'accounting.account.*',
  'accounting.account.admin',
  'admin.user.read',
  'admin.user.update',
  'admin.format.delete',

  // 2 segments

  '*.read',
  '*.update',
  'user.read',
  'user.update',
  'format.delete',
  'format.account',
];

console.log(
  '... intermediata inspection result grouped by section length ...',
  sampleData
    .reduce(collectGlobInspectionItems, { index: {}, result: [] })
    .result
);
console.log(
  '... final sanitized and flattened glob array ...',
  sampleData
    .reduce(collectGlobInspectionItems, { index: {}, result: [] })
    .result
    .flatMap(createSanitizedGlobList)
);

.as-console-wrapper { min-height: 100%!important; top: 0; }