Home > Enterprise >  Split a list on spaces and group quoted characters
Split a list on spaces and group quoted characters

Time:01-06

I'm trying to parse an input string into tokens, where each token is a word in the string. However, I also want the tokens to be able to contain spaces, and for clearer syntax, i'd like to be able to have quotes appear half way through a token, and be able to escape quotes (\")

Example input strings and the outputs I'd want (removing quotes from output to denote strings for readability):

  • Input: diamond_sword name:"test name" -> Output: [diamond_sword, name:test name]
  • Input: stick 1 name:"The \"Holy\" Stick" -> Output: [stick, 1, name:The "Holy" Stick]

Instead of doing what many others have asked in previous questions, I don't want the quotes to have to be separate from other words (name:"string"), and I only want escaped quotes to remain, removing all unescaped quotes.

Is this possible? What would it look like to turn a string into a list this way?

CodePudding user response:

Maybe something like this?

import java.util.*;

public class Demo {
    private static List<String> parse(String in) {
        Objects.requireNonNull(in);
        char[] chars = in.toCharArray();
        var words = new ArrayList<String>();
        var sb = new StringBuilder();
        for (int i = 0; i < chars.length; i  ) {
            if (chars[i] == ' ') {
                // Space; add the current token to the result array.
                words.add(sb.toString());
                sb.setLength(0);
            } else if (chars[i] == '"') {
                // Iterate until the next unescaped quote
                // (Assumes strings are well-formatted; a more robust version
                //  wouldn't and would better handle error cases)
                for (i  ; chars[i] != '"'; i  ) {
                    // If current character is a backslash, skip and append
                    // the next
                    if (chars[i] == '\\') {
                        i  ;
                    }
                    sb.append(chars[i]);
                }
            } else {
                sb.append(chars[i]);
            }
        }
        words.add(sb.toString()); // Don't forget the final token
        return words;
    }

    public static void main(String[] args) {
        List<String> strings =
            List.of("diamond_sword name:\"test name\"",
                    "stick 1 name:\"The \\\"Holy\\\" Stick\"");

        for (String s : strings) {
            List<String> words = parse(s);
            System.out.println(words);
        }
    }
}

when compiled and run, prints out

[diamond_sword, name:test name]
[stick, 1, name:The "Holy" Stick]
  •  Tags:  
  • Related