I have a string that has non-alphanumeric characters, this string contains English and non English alphabets. I need to clean the string from non-alphanumeric characters, but I want to keep some of them. For instance: Let's say that I want to keep comma and colon only.
Example:
String st = "I, Love: ( Coding {} -), codificación"
I want the output to be "I,Love:Coding,codificación"
Is there a regex that can do that?
Note the method below will clean the text from all non-alphanumeric characters.
public static String cleanText(String text) {
return text.replaceAll("\\P{LD} ", "");
}
CodePudding user response:
You can use
public static String cleanText(String text) {
return text.replaceAll("[^\\p{L}\\p{N}:,] ", "");
// or return text.replaceAll("[^\\p{LD}:,] ", "");
}
Details:
[^- start of a negated character class\p{L}- any Unicode letter\p{N}- any digit:- a colon,- a comma
]- end of the character class, repeat one or more times.
See the regex demo. See a Java demo:
import java.util.*;
import java.io.*;
class Test
{
public static void main (String[] args) throws java.lang.Exception
{
String st = "I, Love: ( Coding {} -), codificación";
System.out.println(cleanText(st));
}
public static String cleanText(String text) {
return text.replaceAll("[^\\p{L}\\p{N}:,] ", "");
}
}
// => I,Love:Coding,codificación
