I am trying to match all the words in the sentence:
"That's the password: 'PASSWORD 123'!", cried the Special Agent.\nSo I fled.
I tried:
([A-Za-z\d(^\n$)] ('[A-Za-z] )?)
but I don't want to match \nSo as a word. Only So. As a matter of fact, I want to exclude all forms of white space like \n or \t.
My Julia code is:
sentence = """"That's the password: 'PASSWORD 123'!", cried the Special Agent.\nSo I fled."""
regex = r"([A-Za-z\d(^\n$)] ('[A-Za-z] )?)"
v =[m.match for m = eachmatch(regex, sentence)]
CodePudding user response:
It turned out the \r, \n and \t are two-letter combinations in your texts.
Since Julia uses PCRE you can use a SKIP-FAIL regex here to easily ingore these combinations from matches:
\\[rnt](*SKIP)(*F)|\w (?:['-]\w )*
See the regex demo. Details:
\\[rnt](*SKIP)(*F)- a\char and then eitherr,nort, and then the matched chars are dropped, the match is failed and the engine starts looking for the next match from the failure position|- or\w (?:['-]\w )*- one or more word chars and then zero or more repetitions of'or-and then one or more chars.
In Julia:
julia> sentence = """"That's the password: 'PASSWORD 123'!", cried the Special Agent.\nSo I fled."""
"\"That's the password: 'PASSWORD 123'!\", cried the Special Agent.\nSo I fled."
julia> regex = r"\\[rnt](*SKIP)(*F)|\w (?:['-]\w )*"
r"\\[rnt](*SKIP)(*F)|\w (?:['-]\w )*"
julia> v =[m.match for m = eachmatch(regex, sentence)]
12-element Vector{SubString{String}}:
"That's"
"the"
"password"
"PASSWORD"
"123"
"cried"
"the"
"Special"
"Agent"
"So"
"I"
"fled"
See the online Julia demo.
