Home > Enterprise >  Check whether two strings have same contents regardless of word order
Check whether two strings have same contents regardless of word order

Time:01-21

I have two strings, eg:

Long sentences may be used for several reasons: To develop tension. While a short sentence is the ultimate sign of the tension, long sentences could be used to develop this tension to a point of culmination. To give vivid descriptions.

Long sentences may be used for several reasons: To develop tension. While a short sentence is the sign ultimate of the tension, long sentences could be used to develop this tension to a point of culmination. To give vivid descriptions.

In the second string, the word ultimate has changed its position.

if(string1.equalsIgnoreCase(string2)) returns False but I want the result to be True since the contents of the strings are same (even though the order is not).

CodePudding user response:

You could count the occurences of every word in each String and compare the results :

String phrase = "Long sentences may be used for several reasons: To develop tension. While a short sentence is the ultimate sign of the tension, long sentences could be used to develop this tension to a point of culmination. To give vivid descriptions.";
String phrase2 = "Long sentences may be used for several reasons: To develop tension. While a short sentence is the sign ultimate of the tension, long sentences could be used to develop this tension to a point of culmination. To give vivid descriptions.";
      
Map<String,Long> wordCount = Arrays.stream(phrase.toLowerCase().split("\\W "))
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
        
Map<String,Long> wordCount2 = Arrays.stream(phrase2.toLowerCase().split("\\W "))
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
        
System.out.println(wordCount.equals(wordCount2));
  • The first step is to apply toLowerCase() to your String. Remove this step if you want your comparison to be case sensitive.
    • "Hello world" => "hello world"
  • Then you split() the String around the matches of the following regex : \W to obtain an array. This regex matches one or more non-word character.
    • "hello world" => ["hello", "world"]
  • You call Arrays.stream() on this array to get a Stream.
  • You collect the elements of the Stream using Collectors.groupingBy() to associate every word with its number of occurences. Function.identity() is a function that returns its input.
    • {"hello": 1, "world": 1}

CodePudding user response:

This is an alternate version of Tom's answer. It should be faster for small-ish strings, since it does not spend time & memory to count words, but Tom's version will be better for very long strings (which will likely contain many duplicate words), as comparing word-counts should be much faster than comparing the full text, once the full text enters into the million-word or so range.

public static boolean equalWordsAndCountsIgnoringOrder(String s1, String s2) {
    String[] w1 = s1.toLowerCase().split("\\W ");
    String[] w2 = s2.toLowerCase().split("\\W ");
    Arrays.sort(w1);
    Arrays.sort(w2);
    return Arrays.equals(w1, w2);
}

We first convert the strings into arrays with their words lower-cased. Then we sort those arrays. Finally, we test that the exact same words appear in the exact same places.

Usage:

System.out.println(equalWordsAndCountsIgnoringOrder(phrase, phrase2));
  •  Tags:  
  • Related