Home > Software engineering >  What is the use of max m in the lazy quantifiers {n,m}??
What is the use of max m in the lazy quantifiers {n,m}??

Time:01-19

In regex, we have greedy and lazy quantifiers. The greedy quantifier {n,m} matches the preceding atom/character/group a minimum of n and a maximum of m occurrences, inclusive.

If I have a collection of strings:

a
aa
aaa
aaaa
aaaaaaaaaa

With a{2,4}, it matches:

  • nothing on first line
  • aa on second
  • aaa on third
  • aaaa on fourth
  • (aaaa), (aaaa), and (aa) on fifth line

That makes sense.

However, if I have a lazy quantifier a{2,4}? I get:

  • nothing on first line
  • aa on second line
  • aa on third line
  • (aa) and (aa) on fourth line
  • (aa), (aa), (aa), (aa), and (aa) on fifth line

That actually makes sense. It finds the least amount of possible match.

The part that I want to clarify - is there any usefulness to pass any lazy quantifier in the form of {n,m}? a max value m (in this case, the 4 in {2,4}?)? Isn't the result is always the same as {2,}??

Is there a scenario where passing a max (like the 4 in {2,4}?) is useful in lazy quantifier?

Disclaimer: I am actually using the regular expression to search inside Vim (/a{-2,4}), not in any scripting language. I think the principle of the question is still the same.

CodePudding user response:

It matters when you need to consider what follows the lazily quantified expression. Laziness is used to prevent characters from being consumed by a later expression in a concatenation. Consider the string aaaaab:

  1. The string is not matched by a{2,4}?b, as there are too many as for a{2,4} to match.
  2. The string is matched by a{2,}?b, since it can match as many as as necessary.
  •  Tags:  
  • Related