I need to match a string with an identifier.
Pattern
Any word will be considered as identifier if
- Word doesn't contain any character rather than alpha-numeric characters.
- Word doesn't start with number.
Input
The given input string will not contain any preceding or trailing spaces or white-space characters.
Code
I tried using the following regular expressions
\D[a-zA-Z]\w*\D[ \t\n][a-zA-Z]\w*[ \t\n]^\D[a-zA-Z]\w*$
None of them works.
How can I achieve this?
CodePudding user response:
Note that in your ^\D[a-zA-Z]\w*$ regex, \D can match non-alphanumeric chars since \D matches any non-digit chars, and \w also matches underscores, which is not an alphanumeric char.
I suggest
^(?![0-9])[A-Za-z0-9]*$
It matches
^- start of string(?![0-9])- no digit allowed immediately to the right of the current location[A-Za-z0-9]*- zero or more ASCII letters/digits$- end of string.
See the regex demo.
CodePudding user response:
A \D matches any non-digit characters including not only alphabets but also punctuation characters, whitespace characters etc. and you definitely do not need them in the beginning.
You can use ^[A-Za-z][A-Za-z0-9]*$ which can be described as
^: Start of string[A-Za-z]: An alphabet[A-Za-z0-9]*: An alphanumeric character, zero or more times$: End of string
CodePudding user response:
An even simpler pattern for identifier - not using negative lookahead like Wiktor's answer:
^[^0-9][A-Za-z0-9]*$ decomposed and explained:
^[^0-9]: Word starts^not[^with a number0-9](more exactly, first char is not a digit, but second character can be a digit!).[A-Za-z0-9]*: Word doesn't contain any character rather than alpha-numeric characters (not even hyphen or underscore) until the end$.
See demo on regex101.
Positive alternative
As already suggested by Arvind Kumar Avinash: If (according to both rules) the first char must not be a digit or numeric, but only an alpha, then we could also exchange the first part from above regex from "not-numeric" to "only-alpha".
[A-Za-z][A-Za-z0-9]* explained:
[A-Za-z]: first char must be an alpha[A-Za-z0-9]*: optional second and following chars can be any alpha-numeric
Same effect, see demo on regex101.
Tests
| input | result | reason |
|---|---|---|
| aB123 | matches identifier | |
| Ab123 | matches identifier | |
| XXXX12YZ | matches identifier | |
| a2b3 | matches identifier | |
| a | matches identifier | |
| Z | matches identifier | |
| 0 | no match | starts with a digit |
| 1Ab | no match | starts with a digit |
| 12abc | no match | starts with a digit |
| abc_123 | no match | contains underscore, not alphanum |
| r2-d2 | no match | contains hyphen, not alphanum |
