Home > Mobile >  Regex: Only returns message string - That's starts with messages and string between parent mess
Regex: Only returns message string - That's starts with messages and string between parent mess

Time:02-08

I want to get all the message data only. Such that it should look for message and all the data between curly braces of the parent message. With the below code, I am getting service details too along with message which I don't want. Any suggestion on this experts thanks in advance.

String data = "/**\r\n"   
        " * file\r\n"   
        " */\r\n"   
        "syntax = \"proto3\";\r\n"   
        "package demo;\r\n"   
        "\r\n"   
        "import \"envoyproxy/protoc-gen-validate/validate/validate.proto\";\r\n"   
        "import \"google/api/annotations.proto\";\r\n"   
        "import \"google/protobuf/wrappers.proto\";\r\n"   
        "import \"protoc-gen-swagger/options/annotations.proto\";\r\n"   
        "\r\n"   
        "option go_package = \"bitbucket.com;\r\n"   
        "option java_multiple_files = true;\r\n"   
        "\r\n"   
        "schemes: HTTPS;\r\n"   
        "consumes: \"application/json\";\r\n"   
        "produces: \"application/json\";\r\n"   
        "responses: {\r\n"   
        "key:\r\n"   
        "    \"404\";\r\n"   
        "value: {\r\n"   
        "description:\r\n"   
        "    \"not exist.\";\r\n"   
        "schema: {\r\n"   
        "json_schema: {\r\n"   
        "type:\r\n"   
        "    STRING;\r\n"   
        "}\r\n"   
        "}\r\n"   
        "}\r\n"   
        "}\r\n"   
        "responses: {\r\n"   
        "key:\r\n"   
        "    \"401\";\r\n"   
        "value: {\r\n"   
        "description:\r\n"   
        "    \"Wrong user.\";\r\n"   
        "schema: {\r\n"   
        "json_schema: {\r\n"   
        "type:\r\n"   
        "    STRING;\r\n"   
        "};\r\n"   
        "example: {\r\n"   
        "value:\r\n"   
        "    '{ \"message\": \"wrong user.\" }'\r\n"   
        "}\r\n"   
        "}\r\n"   
        "}\r\n"   
        "}\r\n"   
        "\r\n"   
        "message message1 {\r\n"   
        "    message message2 {\r\n"   
        "        enum Enum {\r\n"   
        "            UNKNOWN = 0;    \r\n"   
        "        }\r\n"   
        "    }\r\n"   
        "    string id = 1;\r\n"   
        "    string name = 3;\r\n"   
        "    string account = 4;\r\n"   
        "}\r\n"   
        "\r\n"   
        "message User{\r\n"   
        "   string firstName = 1 ;\r\n"   
        "   string lastName  = 2 ;\r\n"   
        "   string middleName  = 3 [(validate.rules).repeated = { min_items: 0 }];\r\n"   
        "}\r\n"   
        "\r\n"   
        "service Userlogin{\r\n"   
        "   rpc Login(User) returns (APIResponse);\r\n"   
        "}";
List<String> allmsg = Arrays.asList(data.replaceAll("(?sm)\\A.*?(?=message)", "").split("\\R (?=message)"));

I am expecting response like below in my array list of string with size 2.

allMsg.get(0) should be

message message1 {
    message message2 {
        enum Enum {
            UNKNOWN = 0;    
        }
    }
    string id = 1;
    string name = 3;
    string account = 4;
}

allMsg.get(1) should be

message User{
    string firstName = 1 ;
    string lastName  = 2 ;
    string middleName  = 3 [(validate.rules).repeated = { min_items: 0 }];
}

CodePudding user response:

Use a Pattern that matches a "message" and stream the match results to a List:

List<String> allmsg = Pattern.compile("(?ms)^message.*?^}")
  .matcher(data)
  .results() // stream the MatchResults
  .map(MatchResult::group) // get the entire match
  .collect(toList()); // collect as a List

See live code demo.

Regex breakdown:

  • (?ms) turns on flags s, which makes dot also match newlines, and m, which makes ^ and $ match start and end of each line
  • ^message matches start of a line (not start of input, thanks to the m flag) then "message"
  • .*? reluctantly (ie as little as possible) matches any characters (including newlines, thanks to the s flag). Adding the ? to make the quantifier reluctant stops the match from consuming multiple "messages".
  • ^} matches start of a line (not start of input, thanks to the m flag) then "}"

See live regex demo.

This will work even if "messages" are not contiguous with each other, ie they may be interspersed with other constructs (your example doesn't have this situation, but the linked demos do).

CodePudding user response:

You should see you other question.

Pattern.compile("(?s)^message(.(?!message|service))*");

In message can appear after message

"message message1 {\r\n"

You must adapt the regex.

  •  Tags:  
  • Related