Home > Back-end >  How to separate a String using a List of words?
How to separate a String using a List of words?

Time:01-09

How could I separate a String using a pre-given List of Strings, separating them by spaces?

Eg:

List of words: words = {"hello", "how", "are", "you"}

The string I want to separate: text = "hellohowareyou"

public static String separateText(String text, List<String> words) {
    String new_text;

    for (String word : words) {
        if (text.startsWith(word)) {
            String suffix = text.substring(word.length());  //'suffix' is the 'text' without it's first word
            new_text  = " "   word;  //add the first word of the 'string'
            separateString(suffix, words);
        }
    }
    
    return new_text;
}

And new_text should return hello how are you

Note that the order of the List words could be different and also have more words, like a dictionary.

How could I make this recursion, if needed?

CodePudding user response:

This should do what you want

  • You should use StringBuilder if you find yourself repeatedly appending to a string
  • Use a while loop to iterate through text, remove one word at a time and finish when text is empty
public static String separateText(String text, List<String> words){
        StringBuilder newTextBuilder = new StringBuilder();

        outerLoop:
        while(text.length() > 0){
            for(String word : words){
                if(text.startsWith(word)){
                    newTextBuilder.append(word   " ");
                    text = text.substring(word.length());
                    continue outerLoop;
                }
            }
        }

        return newTextBuilder.toString();
    }
}

CodePudding user response:

How could I separate a String using a pre-given List of Strings, separating them by spaces?

Pretty much how you already started. Checking if the remaining text starts with any of the words from the list, remove the starting word and keep the suffix.

You did all that already, but instead of just keeping the suffix and keep iterating you decided to try to call separateText recursively.

That is also a possibility, but even just normally iterating in a while loop until the suffix (or remaining text) is empty is enough.

Using a loop like while (index < text.length()) will work for longer inputs too even if the words are in a different order.

public String separateText(String text, List<String> words){
    if (text == null) return "";
    if (words == null || words.isEmpty()) return text;

    StringBuilder sb = new StringBuilder();

    boolean unknownWord = false;
    int index = 0;
    while (index < text.length()) {
        boolean wordFound = false;
        for (String word : words) {
            if (!word.isEmpty() && text.startsWith(word, index)) {
                wordFound = true;
                // move the index ahead just past the last letter of the word found
                index  = word.length();
                if (unknownWord) {
                    unknownWord = false;
                    sb.append(" ");
                }
                sb.append(word);
                sb.append(" ");
                break;
            }
        }
        if (!wordFound) {
            unknownWord = true;
            sb.append(text.charAt(index));
            index  ;
        }
    }

    return sb.toString();
}

CodePudding user response:

This solution is pretty simple, but it is not memory optimal, because many new String is created.

public static String separate(String str, Set<String> words) {
    for (String word : words)
        str = str.replace(word, word   ' ');

    return str.trim();
}

Demo

Set<String> words = Set.of("hello", "how", "are", "you");
System.out.println(separate("wow hellohowareyouhellohowareyou", words));
// wow hello how are you hello how are you

Another solution, with StringBuilder and looks better to me from the performance view.

public static String separate(String str, Set<String> words) {
    List<String> res = new LinkedList<>();
    StringBuilder buf = new StringBuilder();

    for (int i = 0; i < str.length(); i  ) {
        buf.append(str.charAt(i));

        if (str.charAt(i) == ' ' || words.contains(buf.toString())) {
            res.add(buf.toString().trim());
            buf.delete(0, buf.length());
        }
    }

    return String.join(" ", res);
}

CodePudding user response:

For a recursive method try the following:

public static String separateText(String text, List<String> words){
    return separateText(text, words, new StringBuilder());
}

public static String separateText(String text, List<String> words, StringBuilder result){

    for(String word : words){
        if (text.startsWith(word)){
           result.append(word).append(" ");
           text = text.substring(word.length());
           ArrayList<String> newList = new ArrayList<>(words);
           newList.remove(word);
           separateText(text, newList, result);
           break;
        }
    }

    return result.toString().trim();
}

CodePudding user response:

import java.util.*;

public class Main {
    public static void main(String[] args) throws Exception {
        // You must sort this by it's length, or you will not have correct result
        // since it may cause match with more shorter words.
        // In this example, it's done
        List<String> words = Arrays.asList("hello", "how", "are", "you");
        List<String> detectedWords = new ArrayList<>();
        String text = "hellohowareyou";
        int i = 0;
        while (i < text.length()) {
            Optional<String> wordOpt = Optional.empty();

            for (String word : words) {
                if (text.indexOf(word, i) >= 0) {
                    wordOpt = Optional.of(word);
                    break;
                }
            }
            if (wordOpt.isPresent()) {
                String wordFound = wordOpt.get();
                i  = wordFound.length();
                detectedWords.add(wordFound);
            }
        }
        String result = String.join(" ", detectedWords);
        System.out.println(result);
    }
}

I assumed:

  • Your text never will be null
  • Your text matches regex ^(hello|how|are|you)$
  • Your words must be sorted
  •  Tags:  
  • Related