Home > Blockchain >  How to clean a string from non-alphanumeric characters, but keep certain ones?
How to clean a string from non-alphanumeric characters, but keep certain ones?

Time:01-31

I have a string that has non-alphanumeric characters, this string contains English and non English alphabets. I need to clean the string from non-alphanumeric characters, but I want to keep some of them. For instance: Let's say that I want to keep comma and colon only.

Example: String st = "I, Love: ( Coding {} -), codificación"

I want the output to be "I,Love:Coding,codificación"

Is there a regex that can do that?

Note the method below will clean the text from all non-alphanumeric characters.

public static String cleanText(String text) {
     return text.replaceAll("\\P{LD} ", "");
}

CodePudding user response:

You can use

public static String cleanText(String text) {
    return text.replaceAll("[^\\p{L}\\p{N}:,] ", "");
    // or return text.replaceAll("[^\\p{LD}:,] ", "");
}

Details:

  • [^ - start of a negated character class
    • \p{L} - any Unicode letter
    • \p{N} - any digit
    • : - a colon
    • , - a comma
  • ] - end of the character class, repeat one or more times.

See the regex demo. See a Java demo:

import java.util.*;
import java.io.*;

class Test
{
    public static void main (String[] args) throws java.lang.Exception
    {
        String st = "I, Love: ( Coding {} -), codificación";
        System.out.println(cleanText(st));

    }
    public static String cleanText(String text) {
        return text.replaceAll("[^\\p{L}\\p{N}:,] ", "");
    }
}
// => I,Love:Coding,codificación
  •  Tags:  
  • Related