I'm trying to parse an input string into tokens, where each token is a word in the string.
However, I also want the tokens to be able to contain spaces, and for clearer syntax, i'd like to be able to have quotes appear half way through a token, and be able to escape quotes (\")
Example input strings and the outputs I'd want (removing quotes from output to denote strings for readability):
- Input:
diamond_sword name:"test name"-> Output: [diamond_sword, name:test name] - Input:
stick 1 name:"The \"Holy\" Stick"-> Output: [stick, 1, name:The "Holy" Stick]
Instead of doing what many others have asked in previous questions, I don't want the quotes to have to be separate from other words (name:"string"), and I only want escaped quotes to remain, removing all unescaped quotes.
Is this possible? What would it look like to turn a string into a list this way?
CodePudding user response:
Maybe something like this?
import java.util.*;
public class Demo {
private static List<String> parse(String in) {
Objects.requireNonNull(in);
char[] chars = in.toCharArray();
var words = new ArrayList<String>();
var sb = new StringBuilder();
for (int i = 0; i < chars.length; i ) {
if (chars[i] == ' ') {
// Space; add the current token to the result array.
words.add(sb.toString());
sb.setLength(0);
} else if (chars[i] == '"') {
// Iterate until the next unescaped quote
// (Assumes strings are well-formatted; a more robust version
// wouldn't and would better handle error cases)
for (i ; chars[i] != '"'; i ) {
// If current character is a backslash, skip and append
// the next
if (chars[i] == '\\') {
i ;
}
sb.append(chars[i]);
}
} else {
sb.append(chars[i]);
}
}
words.add(sb.toString()); // Don't forget the final token
return words;
}
public static void main(String[] args) {
List<String> strings =
List.of("diamond_sword name:\"test name\"",
"stick 1 name:\"The \\\"Holy\\\" Stick\"");
for (String s : strings) {
List<String> words = parse(s);
System.out.println(words);
}
}
}
when compiled and run, prints out
[diamond_sword, name:test name]
[stick, 1, name:The "Holy" Stick]
