Ok, I'm trying to catch text using a regex with the following rules:
- Each new line starts with the word
typeortag, and:comes after that. |typeortagshould be the capture group 1 - A varchar might come after
:| That varchar should be the capture group 2 \\comes after that- A number comes after
\\| That number should be the capture group 3 ?might come after the number- If we have
?, a varchar might come after?| That varchar should be the capture group 4 - If we have
?a varchar, then:might come after that - If we have
?a varchar:, then a varchar might come after that | That varchar should be the capture group 5
Examples:
type:test\\1?value12:value9 // Should get: Group 1 = type, Group 2 = test, Group 3 = 1, Group 4 = value12, Group 5 = value9
type:\\22?value62:value3 // Should get: Group 1 = type, Group 2 = NULL, Group 3 = 22, Group 4 = value62, Group 5 = value3
My regex is:
/(type|tag):([^\\] )?\\\\([0-9]{1,3})?\??([^\:] ):([^\:] )?/i
I believe that it's not accurate, for example:
type:\\1p?hello:iii
The current regex matches 1 as Group 3 and p?hello as Group 4, however, it should not match this at all. Group 3 must be number and ? might come after it, type:\\1p?hello:iii doesn't follow the format that we want.
Anyone can help please? Thanks!
CodePudding user response:
Try this
/(type|tag):(\w )?\\\\([0-9]{1,3})?\??(\w )?:?(\w )?/gi
I think it's better to match word \w instead of just avoiding matching others characters
CodePudding user response:
You can use
/^(type|tag):([a-zA-Z0-9]*)\\\\([0-9]{1,3})(?:\?([a-zA-Z0-9] )(?::([a-zA-Z0-9] ))?)?$/i
See the regex demo. Details:
^- start of string(type|tag)-typeortag:- a colon([a-zA-Z0-9]*)- Group 2: zero or more alphanumeric chars\\\\- two backslashes([0-9]{1,3})- Group 3: one, two or three digits(?:\?([a-zA-Z0-9] )(?::([a-zA-Z0-9] ))?)?- an optional sequence of\?- a?char([a-zA-Z0-9] )- Group 4: one or more alphanumeric chars(?::([a-zA-Z0-9] ))?- an optional sequence of:- a colon([a-zA-Z0-9] )- Group 5: one or more alphanumeric chars
$- end of string.
