Check whether two strings have same contents regardless of word order-CodePudding

I have two strings, eg:

Long sentences may be used for several reasons: To develop tension. While a short sentence is the ultimate sign of the tension, long sentences could be used to develop this tension to a point of culmination. To give vivid descriptions.

Long sentences may be used for several reasons: To develop tension. While a short sentence is the sign ultimate of the tension, long sentences could be used to develop this tension to a point of culmination. To give vivid descriptions.

In the second string, the word ultimate has changed its position.

if(string1.equalsIgnoreCase(string2)) returns False but I want the result to be True since the contents of the strings are same (even though the order is not).

CodePudding user response：

You could count the occurences of every word in each String and compare the results :

String phrase = "Long sentences may be used for several reasons: To develop tension. While a short sentence is the ultimate sign of the tension, long sentences could be used to develop this tension to a point of culmination. To give vivid descriptions.";
String phrase2 = "Long sentences may be used for several reasons: To develop tension. While a short sentence is the sign ultimate of the tension, long sentences could be used to develop this tension to a point of culmination. To give vivid descriptions.";
      
Map<String,Long> wordCount = Arrays.stream(phrase.toLowerCase().split("\\W "))
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
        
Map<String,Long> wordCount2 = Arrays.stream(phrase2.toLowerCase().split("\\W "))
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
        
System.out.println(wordCount.equals(wordCount2));

The first step is to apply toLowerCase() to your String. Remove this step if you want your comparison to be case sensitive.
- "Hello world" => "hello world"
Then you split() the String around the matches of the following regex : \W to obtain an array. This regex matches one or more non-word character.
- "hello world" => ["hello", "world"]
You call Arrays.stream() on this array to get a Stream.
You collect the elements of the Stream using Collectors.groupingBy() to associate every word with its number of occurences. Function.identity() is a function that returns its input.
- {"hello": 1, "world": 1}

CodePudding user response：

This is an alternate version of Tom's answer. It should be faster for small-ish strings, since it does not spend time & memory to count words, but Tom's version will be better for very long strings (which will likely contain many duplicate words), as comparing word-counts should be much faster than comparing the full text, once the full text enters into the million-word or so range.

public static boolean equalWordsAndCountsIgnoringOrder(String s1, String s2) {
    String[] w1 = s1.toLowerCase().split("\\W ");
    String[] w2 = s2.toLowerCase().split("\\W ");
    Arrays.sort(w1);
    Arrays.sort(w2);
    return Arrays.equals(w1, w2);
}

We first convert the strings into arrays with their words lower-cased. Then we sort those arrays. Finally, we test that the exact same words appear in the exact same places.

Usage:

System.out.println(equalWordsAndCountsIgnoringOrder(phrase, phrase2));