String1: {{word1|word2|word3 (word4 word5)|word6}}
String2: {{word1|word2|word3|word6}}
With this regex sentence:
(?<=\{\{)(\w (?:\s \w )*)\|(\w (?:\s \w )*)\|(\w (?:\s \w )*)\|(\w (?:\s \w )*)(?=\}\})
I capture String2 as groups. How can I change the regex sentence to capture (word4 word5) also as a group?
CodePudding user response:
You can add a (?:\s*(\([^()]*\)))? subpattern:
(?<=\{\{)(\w (?:\s \w )*)\|(\w (?:\s \w )*)\|(\w (?:\s \w )*)(?:\s*(\([^()]*\)))?\|(\w (?:\s \w )*)(?=\}\})
See the regex demo.
The (?:\s*(\([^()]*\)))? part is an optional non-capturing group that matches one or zero occurrences of
\s*- zero or more whitespaces(- start of a capturing group:\(- a(char[^()]*- zero or more chars other than(and)\)- a)char
)- end of the group.
If you need to make sure only whitespace separated words are allowed inside parentheses, replace [^()]* with \w (?:\s \w )* and insert (?:\s*(\(\w (?:\s \w )*\)))?:
(?<=\{\{)(\w (?:\s \w )*)\|(\w (?:\s \w )*)\|(\w (?:\s \w )*)(?:\s*(\(\w (?:\s \w )*\)))?\|(\w (?:\s \w )*)(?=\}\})
See this regex demo.
CodePudding user response:
You could simplify the expression by matching the desired substrings rather than capturing them. For that you could use the following regular expression.
(?<=[{| ])\w (?=[}| ])|\([\w ] \)
Regex demo <¯\(ツ)/¯> Python demo
The elements of the expression are as follows.
(?<= # begin a positive lookbehind
[{| ] # match one of the indicated characters
) # end the positive lookbehind
\w # match one or more word characters
(?= # begin a positive lookahead
[}| ] # match one of the indicated characters
) # end positive lookahead
| # or
\( # match character
[\w ] # match one or more of the indicated characters
\) # match character
Note that this does not validate the format of the string.
