Home > Mobile >  Splitting input string when it contains countires with multiple words
Splitting input string when it contains countires with multiple words

Time:01-28

I get multiple countries as an input that i have to split by space. If the country has multiple word it's declared between "". For example

Chad Benin Angola Algeria Finland Romania "Democratic Republic of the Congo" Bolivia Uzbekistan Lesotho "United States of America"

At the moment im able to split the countries word by word. So United States of America doesnt stay together as one country.

    BufferedReader reader = new BufferedReader(
            new InputStreamReader(System.in));
    // Reading data using readLine
    String str = reader.readLine();
    ArrayList<String> sets = new ArrayList<String>();

    String[] newStr = str.split("[\\W]");
    boolean check = false;
    for (String s : newStr) {
        sets.add(s);
    }
    System.out.print(sets);

How can i split these countries so that the multiword countires dont get split?

CodePudding user response:

Instead of matching what to split, match country names. You need to catch either letters, or letters and spaces between quotes. Match 1 or more letters - [a-zA-Z] , or(|) match letters and spaces between quotes - "[a-zA-Z\s] ".

    String input = "Chad Benin Angola Algeria Finland Romania \"Democratic Republic of the Congo\" Bolivia Uzbekistan Lesotho \"United States of America\"";
    Pattern pattern = Pattern.compile("[a-zA-Z] |\"[a-zA-Z\\s] \"");
    Matcher matcher = pattern.matcher(input);
    while (matcher.find()) {
      String result = matcher.group();
      if (result.startsWith("\"")) {
        //quotes are matched, so remove them
        result = result.substring(1, result.length() - 1);
      }
      System.out.println(result);
    }

CodePudding user response:

Hm, may be I am not intelligent enough, but I do not see any one-line-of-code solution, but I can think of the following solution:

public static void main(String[] args) {
        String inputString = "Chad Benin Angola Algeria Finland Romania \"Democratic Republic of the Congo\" Bolivia Uzbekistan Lesotho \"United States of America\"\n";

        List<String> resultCountriesList = new ArrayList<>();
        int currentIndex = 0;
        boolean processingMultiWordsCountry = false;
        for (int i = 0; i < inputString.length(); i  ) {
            Optional<String> substringAsOptional = extractNextSubstring(inputString, currentIndex);
            if (substringAsOptional.isPresent()) {
                String substring = substringAsOptional.get();
                currentIndex  = substring.length()   1;
                if (processingMultiWordsCountry) {
                    resultCountriesList.add(substring);
                } else {
                    resultCountriesList.addAll(Arrays.stream(substring.split(" ")).peek(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toList()));
                }
                processingMultiWordsCountry = !processingMultiWordsCountry;
            }
        }

        System.out.println(resultCountriesList);
    }

    private static Optional<String> extractNextSubstring(String inputString, int currentIndex) {
        if (inputString.length() > currentIndex   1) {
            return Optional.of(inputString.substring(currentIndex, inputString.indexOf("\"", currentIndex   1)));
        }
        return Optional.empty();
    }

The result list of the countries, as strings, resides in resultCountriesList. That code just iterates over the string, taking substring of the original string - inputString from the previous substring index - currentIndex to the next occurrence of \" symbol. If the substring is present - we continue processing. Also we segregate countries enclosed by \" symbol from countries, that resides outside of \" by the boolean flag processingMultiWordsCountry.

So, at least for now, I cannot find anything better. Also I do not think that this code is ideal, I think there are a lot of possible improvements, so if you consider any - feel free to add a comment. Hope it helped, have a nice day!

  •  Tags:  
  • Related