I have the following text
<root>
<path>/my/data</path>
<paths>/global/data</paths>
</root>
and I'm trying to get a regex capture group for /my/data/ and /global/data only. I tried this:
^\s*(?=<path>|<paths>)(.*)$
but I don't understand why the (.*) groups are:
<path>/my/data</path>
<paths>/global/data</paths>
Is there any way to exclude the positive lookahead from the capture group?
CodePudding user response:
The .* consumes the <path> and <paths> that are checked for with your lookahead. Look, (?=<path>|<paths>)(.*) in your regex first checks if there is <path> or <paths> immediately to the right of the current location and if there is, (.*) readily consumes (=adds the matched text to the overall match value and advances the regex index to the end of the current subpattern match) the <path> or <paths> since .* matches zero or more chars other than line break chars, as many as possible.
Make the lookahead pattern consuming:
^\s*(?:<path>|<paths>)(.*)$

