I want to get all the message data. Such that it should look for message and all the data between curly braces of the parent message. With the below pattern, I am not getting all parent body.
String data = "syntax = \"proto3\";\r\n"
"package grpc;\r\n"
"\r\n"
"import \"envoyproxy/protoc-gen-validate/validate/validate.proto\";\r\n"
"import \"google/api/annotations.proto\";\r\n"
"import \"google/protobuf/wrappers.proto\";\r\n"
"import \"protoc-gen-swagger/options/annotations.proto\";\r\n"
"\r\n"
"message Acc {\r\n"
" message AccErr {\r\n"
" enum Enum {\r\n"
" UNKNOWN = 0;\r\n"
" CASH = 1;\r\n"
" }\r\n"
" }\r\n"
" string account_id = 1;\r\n"
" string name = 3;\r\n"
" string account_type = 4;\r\n"
"}\r\n"
"\r\n"
"message Name {\r\n"
" string firstname = 1;\r\n"
" string lastname = 2;\r\n"
"}";
List<String> allMessages = new ArrayList<>();
Pattern pattern = Pattern.compile("message[^\\}]*\\}");
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
String str = matcher.group();
allMessages.add(str);
System.out.println(str);
}
}
I am expecting response like below in my array list of string with size 2.
allMessage.get(0) should be:
message Acc {
message AccErr {
enum Enum {
UNKNOWN = 0;
CASH = 1;
}
}
string account_id = 1;
string name = 3;
string account_type = 4;
}
and allMessage.get(1) should be:
message Name {
string firstname = 1;
string lastname = 2;
}
CodePudding user response:
First remove the input prior to "message" appearing at the start of the line, then split on newlines followed by "message" (include the newlines in the split so newlines that intervene parent messages are consumed):
String[] messages = data.replaceAll("(?sm)\\A.*?(?=message)", "").split("\\R (?=message)");
See live demo.
If you actually need a List<String>, pass that result to Arrays.asList():
List<String> = Arrays.asList(data.replaceAll("(?sm)\\A.*?(?=message)", "").split("\\R (?=message)"));
CodePudding user response:
Try this for your regex. It anchors on message being the start of a line, and uses a positive lookahead to find the next message or the end of messages.
Pattern.compile("(?s)\r\n(message.*?)(?=(\r\n) message|$)")
// or
Pattern.compile("(?s)\r?\n(message.*?)(?=(\r?\n) message|$)")
No spliting, parsing, or managing nested braces either :)
