The strings I parse with a regular expression contain a region of fixed length N where there can either be numbers or dashes. However, if a dash occurs, only dashes are allowed to follow for the rest of the region. After this region, numbers, dashes, and letters are allowed to occur.
Examples (N=5, starting at the beginning):
12345ABC
12345123
1234-1
1234--1
1----1AB
How can I correctly match this? I currently am stuck at something like (?:\d|-(?!\d)){5}[A-Z0-9\-] (for N=5), but I cannot make numbers work directly following my region if a dash is present, as the negative look ahead blocks the match.
Update
Strings that should not be matched (N=5)
1-2-3-A
----1AB
--1--1A
CodePudding user response:
You could assert that the first 5 characters are either digits or - and make sure that there is no - before a digit in the first 5 chars.
^(?![\d-]{0,3}-\d)(?=[\d-]{5})[A-Z\d-] $
^Start of string(?![\d-]{0,3}-\d)Make sure that in the first 5 chars there is no-before a digit(?=[\d-]{5})Assert at least 5 digits or-[A-Z\d-]Match 1 times any of the listed characters$End of string
If atomic groups are available:
^(?=[\d-]{5})(?>\d -*|-{5})[A-Z\d_]*$
^Start of string(?=[\d-]{5})Assert at least 5 chars-or digit(?>Atomic group\d -*Match 1 digits and optional-|or-{5}match 5 times-
)Close atomic group[A-Z\d_]*Match optional chars A-Z digit or_$End of string
CodePudding user response:
Use a non-word-boundary assertion \B:
^[-\d](?:-|\B\d){4}[A-Z\d-]*$
A non word-boundary succeeds at a position between two word characters (from \w ie [A-Za-z0-9_]) or two non-word characters (from \W ie [^A-Za-z0-9_]). (and also between a non-word character and the limit of the string)
With it, each \B\d always follows a digit. (and can't follow a dash)
Other way (if lookbehinds are allowed):
^\d*-*(?<=^.{5})[A-Z\d-]*$
