How do I implement both a lookahead (without replacement), and a non-lookahead in the same regex statement?
I want to split up a sentence such as:
"ad1 cow run sick ag2 4 8 6 9 crap2 ag lag pag arg2 8 6 5"
into
ad1 cow run sick
ag2 4 8 6 9
crap2 ag lag pag
arg2 8 6 5
Here is the statement that almost gets me there with a lookahead:
"(?=\\s\\w\\w*\\d)"
That is, it looks for a space, a character in the string, any number of characters following that, and then it is followed by a digit. Here Is what I get with that:
ad1 cow run sick
ag2 4 8 6 9
crap2 ag lag pag
arg2 8 6 5
Notice the spaces there still since I had a lookahead. How do I remove those spaces as well in the same regex statement?
CodePudding user response:
You can move the whitespace matching pattern out of the lookahead:
"\\s (?=\\w \\d)"
This way, the whitespaces will get consumed and thus removed during splitting.
Details
\s- one or more whitespaces(?=\w \d)- a positive lookahead that matches a location that is immediately followed with one or more word chars and then a digit.
See the regex demo.
CodePudding user response:
You can also use your pattern as a match (note that \\w\\w* can be written as\\w
\\w \\d.*?(?=\\s\\w \\d|$)
Explanation
\\w \\dMatch 1 word chars and a digit.*?Match as least as possible characters(?=Positive lookeahd\\s\\w \\dmatch a whitespace char, 1 word chars and a digit|Or$Assert the end of the string
)Close lookahead
