My web-application allows users to specify custom URI path components which comply with the following restrictions:
- All characters must be lowercase.
- Be at least 2 characters long.
- First character must match
[a-z]. - The last character must match
[0-9a-z]. - All other characters must match
[0-9a-z_\-]. - The
-and_characters must not exist as a consecutive run of 2 or more.- i.e. The string must not contain
--,__,_-, or-_.
- i.e. The string must not contain
I've implemented the first 5 rules in a regular-expression easily enough:
^[a-z][0-9_a-z\-]*[0-9a-z]$
...however I don't know how to implement the last rule in a single regex.
I thought I'd start by just trying to change the regex so it won't match -- (as in a--b) - and I was thinking it could be a negative-lookahead, as it's asserting that that regex does not contain -- (right?):
Lookahead and lookbehind, collectively called “lookaround”, are zero-length assertions just like the start and end of line, and start and end of word anchors. [...] The difference is that lookaround actually matches characters, but then gives up the match, returning only the result: match or no match. That is why they are called “assertions”. They do not consume characters in the string, but only assert whether a match is possible or not
But adding (?!\-\-) to the regular expression (on regex101.com) in various spots, or as a lookbehind (?<!\-\-) does not cause strings like a--b to not-match.
i.e. all of these patterns match foo--bar when it shouldn't.
(?!\-\-)^[a-z][0-9_a-z\-]*[0-9a-z]$
^(?!\-\-)[a-z][0-9_a-z\-]*[0-9a-z]$
^[a-z](?!\-\-)[0-9_a-z\-]*[0-9a-z]$
^[a-z](?!\-\-)(?:[0-9_a-z\-]*)[0-9a-z]$
^[a-z][0-9_a-z\-]*(?!\-\-)[0-9a-z]$
^[a-z][0-9_a-z\-]*(?<!\-\-)[0-9a-z]$
CodePudding user response:
You can place the negative lookahead right after matching a-z at the start of the string.
As you don't want to match any combination of - and - you can use 2 character classes (?!.*[_-][_-])
As the [_-][_-] part can occur anywhere in the string, you can precede it with .* optionally matching any character.
If you omit .* the assertion only runs on the current position, which in this case would be after matching the a-z at the start of the string.
^[a-z](?!.*[_-][_-])[0-9_a-z-]*[0-9a-z]$
