Home > Software design >  Java Regex: Text extraction into an array list
Java Regex: Text extraction into an array list

Time:01-06

I'm struggling with a simple regex, for which I can't seem to get right.

I have some text like so:

This comment is great **[@madeUpUser1](/madeUpUser1)** You said something similar did you mate? **[@madeUpUser2](/madeUpUser2)**

What I would like to end up with is an array list containing the usernames inbetween the parentheses i.e.:

0.madeUpUser1
1.madeUpUser2

And here is the code I have so far:

List<String> matches = Pattern.compile("\\((. ?)\\)")
        .matcher("This comment is great **[@madeUpUser1](/madeUpUser1)** You said something similar did you mate? **[@madeUpUser2](/madeUpUser2)**")
        .results()
        .map(MatchResult::group)
        .collect(Collectors.toList());

However what I'm getting back is this:

0."(/madeUpUser1)"
1."(/madeUpUser2)"

Again, where I want:

0.madeUpUser1
1.madeUpUser2

i.e. without the parentheses and without the forwardslash

Can anyone shed any light on what I'm doing wrong with my regex please?

CodePudding user response:

Try this regex:

(?<=\\(/)[^)] (?=\\))

Click for Demo

Explanation

  • (?<=\\(/) - positive lookbehind to make sure that the current position is preceded by a (/

  • [^)] - matches 1 or more occurences(as many as possible) of any character that is not a )

  • (?=\\)) - positive lookahead to make sure that the current position is followed by a )

With the regex you are using, \\((. ?)\\), the following happens:

  • \\( - matches the opening parenthesis (
  • (. ?) - matches any character(except a new line character) 1 or more times, as few as possible. This subpattern will keep on expanding the match until it reaches the ). That's why it is matching everything between the parenthesis(even the /)
  • \\) - matches the closing parenthesis )

CodePudding user response:

You can match ](/ and then capture any zero or more chars other than ( and ) till the next ), and collect Group 1 matches only:

import java.util.*;
import java.util.regex.*;
import java.util.stream.Collectors;


class Test
{
    public static void main (String[] args) throws java.lang.Exception
    {
        String text = "This comment is great **[@madeUpUser1](/madeUpUser1)** You said something similar did you mate? **[@madeUpUser2](/madeUpUser2)**";

        Pattern p = Pattern.compile("]\\(/([^()]*)\\)");
        List<String> results = p.matcher(text)
            .results()
            .map(mr -> mr.group(1))
            .collect(Collectors.toList());
        
        // Or, to get a string array:
        // String[] results = p.matcher(text).results().map(mr -> mr.group(1)).toArray(String[]::new);

        for (String x: results) {
            System.out.println(x);
        }
    }
}

See the online demo. Output:

madeUpUser1
madeUpUser2

See the regex demo. Details:

  • ]\(/ - a ])/ string
  • ([^()]*) - Capturing group 1: any zero or more chars other than ) and (
  • \) - a ) char.

CodePudding user response:

You can use a capture group, and match the outer parenthesis/square brackets:

\(/([^\s()] )\)
  • \(/ Match (/
  • ( Capture group 1
    • [^\s()] Match 1 chars other than a whitespace char or ( )
  • ) Close group 1
  • \) Match )

Regex demo

List<String> matches = Pattern.compile("\\(/([^\\s()] )\\)")
    .matcher("This comment is great **[@madeUpUser1](/madeUpUser1)** You said something similar did you mate? **[@madeUpUser2](/madeUpUser2)**")
    .results()
    .map(m -> m.group(1))
    .collect(Collectors.toList());

for (String s : matches)
    System.out.println(s);

Output

madeUpUser1
madeUpUser2

Or in the example, the string between the square brackets seems to be the same, so another option using the same code could be:

\[@([^\s\]\[] )]
  • \[@ match [@
  • ( Capture group 1
    • [^\s\]\[] Match 1 chars other than a whitespace char or [ ]
  • ) Close group 1
  • ] Match ]

Regex demo | Java demo

  •  Tags:  
  • Related