| String | Regex | result | row |
|---|---|---|---|
| ccaaabb | [^a] | cc | 1 |
| aaabb | [^a] | bb | 2 |
| AbbbAcc | [^A] | b | 3 |
| AbbbAcc | [^A]? | (empty) | 4 |
| AbbbAcc | [^A]* | (empty) | 5 |
| AbbbAcc | ([^A] ){1} | bbb | 6 |
| AbbbAcc | ([^A] ){2} | b | 7 |
row 1: From left to right, cc is matched and output. bb is not. why?
row 2: the engine seems to skip every a then reached bb matched and output. in row 1, the a's are not skipped and the b's are not reached. why such difference?
row 3 -5 : why such difference?
row 6 -7 : why such difference?
Please explain, character by character and space by space, the working process of the engine.
The result is obtained from testing the strings and regex on https://onlinetexttools.com/extract-regex-matches-from-text
CodePudding user response:
A regular expression matches a contiguous substring. So it starts matching the first
cand stops when it gets toa, and the match is justcc.bbis not included because it's separated fromcc.It skips over the
acharacters because they don't match, and matchesbbafter it.It matches the first character that isn't
A, so it matches the firstb.Since you've made the pattern optional with
?, it will match an empty string. It matches the empty string at the beginning of the string, and returns an empty match.*means zero or more matches of the pattern. There are zero matches at the very beginning, so it returns that empty string.
6-7 You seem to be printing what the capture group matched, not the entire regexp. When you quantify a capture group, it captures the last repetition.
In 6, the capture group only has to match 1 time, so it can capture everything that [^A] matches, which is bbb.
But in 7, the capture group has to match twice. If the first repetition matched bbb, there would be nothing for the second repetition to match, so it backtracks. Now the first repetition matches bb and the second matches b, and the latter is the value of the capture group.
CodePudding user response:
ccbbis not a substring ofccaaabb, so it can't possibly be matched.It fails to match starting at position 0, so it tries starting at subsequent positions until it finally finds a match at position 3.
It fails to match starting at position 0, so it tries starting at subsequent positions until it finally finds a match at position 1.
It finds a matching substring at position 0. It doesn't need to look further.
It finds a matching substring at position 0. It doesn't need to look further.
It fails to match starting at position 0, so it tries starting at subsequent positions until it finally finds a match at position 1.
It fails to match starting at position 0, so it tries starting at subsequent positions until it finally finds a match at position 1. It actually matches
bbb, notbas you claim.In the first attempt,
[^A]first matchesbbb. But then[^A]can't match a second time. So it tries matching less.In the second attempt,
[^A]first matchesbbthenb. The last match is what's captured.
