Home > Mobile >  validate special characters by negating unicode letters with regex pattern?
validate special characters by negating unicode letters with regex pattern?

Time:02-08

This regex: \p{L} matches these characters "ASKJKSDJKDSJÄÖÅüé" of the example string "ASKJKSDJK_-.;,DSJÄÖÅ!”#€%&/()=?`¨’<>üé" which is great but is the exact opposite of what I want. Which leads me to negating regexes.

Goal:

I want to match any and all characters that are not a letter nor a number in multiple languages.

Could a negative regex be a natural direction for this?

I should mention one intended use for the regex I'd like to find is to validate passwords for the rule:

  • that it needs to contain at least one special character, which I define as not being a number nor a letter.

It would seem defining ranges of special characters should be avoided if possible, because why limit the possibilities? Thus my definition. I assume there could be some problems with such a wide definition, but it is a first step.

If you have some suggestions for a better solution I'm giving below or just have some thoughts on the subject, I'm sure I'm not the only one that would like to learn about it. Thanks.

Note I'm using double \\ in the Java code. Platform is Java 11.

CodePudding user response:

You can shove those \\p things in []. And thus, use the fact that you can negate chargroups. This is all you need:

Pattern p = Pattern.compile("[^\\p{L}]");
Matcher m = p.matcher("ASKJKSDJK_-.;,DSJÄÖÅ!”#€%&/()=?`¨’<>üé");
while (m.find()) System.out.print(m.group(0));

That prints:

_-.;,!”#€%&/()=?`¨’<>

Which is exactly what you're looking for, no?

No need to mess with lookaheads here.

CodePudding user response:

So after having read similar, though not identical questions and some equally great answers, I came up with this solution: (?=\P{L})(?=\P{N}) meaning match both not letters and not numbers. Even if I'm asserting numbers separately I need to negate both to meet the specification of special characters (See question).

This is making use of a non-consuming regular expression with the parentheses and the?=, first matching the expression in the first parenthesis and after that continue to match the whole in the second. Thanks to @Jason Cohen for this detail in the Regular Expressions: Is there an AND operator? discussion.

The upper case P in \P{L} and \P{N} expresses the "not belonging to a category" in Unicode Categories, where the uppercase P means "not", i e the opposite of a lowercase p.

It's not perfect for a real world solution, but works as a starting point at least. Note I'm using double \\ in the Java code. Platform is Java 11.

  •  Tags:  
  • Related