I get multiple countries as an input that i have to split by space. If the country has multiple word it's declared between "". For example
Chad Benin Angola Algeria Finland Romania "Democratic Republic of the Congo" Bolivia Uzbekistan Lesotho "United States of America"
At the moment im able to split the countries word by word. So United States of America doesnt stay together as one country.
BufferedReader reader = new BufferedReader(
new InputStreamReader(System.in));
// Reading data using readLine
String str = reader.readLine();
ArrayList<String> sets = new ArrayList<String>();
String[] newStr = str.split("[\\W]");
boolean check = false;
for (String s : newStr) {
sets.add(s);
}
System.out.print(sets);
How can i split these countries so that the multiword countires dont get split?
CodePudding user response:
Instead of matching what to split, match country names. You need to catch either letters, or letters and spaces between quotes. Match 1 or more letters - [a-zA-Z] , or(|) match letters and spaces between quotes - "[a-zA-Z\s] ".
String input = "Chad Benin Angola Algeria Finland Romania \"Democratic Republic of the Congo\" Bolivia Uzbekistan Lesotho \"United States of America\"";
Pattern pattern = Pattern.compile("[a-zA-Z] |\"[a-zA-Z\\s] \"");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
String result = matcher.group();
if (result.startsWith("\"")) {
//quotes are matched, so remove them
result = result.substring(1, result.length() - 1);
}
System.out.println(result);
}
CodePudding user response:
Hm, may be I am not intelligent enough, but I do not see any one-line-of-code solution, but I can think of the following solution:
public static void main(String[] args) {
String inputString = "Chad Benin Angola Algeria Finland Romania \"Democratic Republic of the Congo\" Bolivia Uzbekistan Lesotho \"United States of America\"\n";
List<String> resultCountriesList = new ArrayList<>();
int currentIndex = 0;
boolean processingMultiWordsCountry = false;
for (int i = 0; i < inputString.length(); i ) {
Optional<String> substringAsOptional = extractNextSubstring(inputString, currentIndex);
if (substringAsOptional.isPresent()) {
String substring = substringAsOptional.get();
currentIndex = substring.length() 1;
if (processingMultiWordsCountry) {
resultCountriesList.add(substring);
} else {
resultCountriesList.addAll(Arrays.stream(substring.split(" ")).peek(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toList()));
}
processingMultiWordsCountry = !processingMultiWordsCountry;
}
}
System.out.println(resultCountriesList);
}
private static Optional<String> extractNextSubstring(String inputString, int currentIndex) {
if (inputString.length() > currentIndex 1) {
return Optional.of(inputString.substring(currentIndex, inputString.indexOf("\"", currentIndex 1)));
}
return Optional.empty();
}
The result list of the countries, as strings, resides in resultCountriesList. That code just iterates over the string, taking substring of the original string - inputString from the previous substring index - currentIndex to the next occurrence of \" symbol. If the substring is present - we continue processing. Also we segregate countries enclosed by \" symbol from countries, that resides outside of \" by the boolean flag processingMultiWordsCountry.
So, at least for now, I cannot find anything better. Also I do not think that this code is ideal, I think there are a lot of possible improvements, so if you consider any - feel free to add a comment. Hope it helped, have a nice day!
