I have the regex (?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z] [A-Za-z0-9-_] )(?!\w).
Given the string @first@nope @second@Hello @my-friend, email@ [email protected] @friend, what can I do to exclude the strings @first and @second since they are not whole words on their own ?
In other words, exclude them since they are succeeded by @ .
CodePudding user response:
You can use
(?<![a-zA-Z0-9_.-])@(?=([A-Za-z] [A-Za-z0-9_-]*))\1(?![@\w])
(?a)(?<![\w.-])@(?=([A-Za-z][\w-]*))\1(?![@\w])
See the regex demo. Details:
(?<![a-zA-Z0-9_.-])- a negative lookbehind that matches a location that is not immediately preceded with ASCII digits, letters,_,.and-@- a@char(?=([A-Za-z] [A-Za-z0-9_-]*))- a positive lookahead with a capturing group inside that captures one or more ASCII letters and then zero or more ASCII letters, digits,-or_chars\1- the Group 1 value (backreferences are atomic, no backtracking is allowed through them)(?![@\w])- a negative lookahead that fails the match if there is a word char (letter, digit or_) or a@char immediately to the right of the current location.
Note I put hyphens at the end of the character classes, this is best practice.
The (?a)(?<![\w.-])@(?=([A-Za-z][\w-]*))\1(?![@\w]) alternative uses shorthand character classes and the (?a) inline modifier (equivalent of re.ASCII / re.A makes \w only match ASCII chars (as in the original version). Remove (?a) if you plan to match any Unicode digits/letters.
CodePudding user response:
Another option is to assert a whitespace boundary to the left, and assert no word char or @ sign to the right.
(?<!\S)@([A-Za-z] [\w-] )(?![@\w])
The pattern matches:
(?<!\S)Negative lookbehind, assert not a non whitespace char to the left@Match literally([A-Za-z] [\w-] )Capture group1, match 1 chars A-Za-z and then 1 word chars or-(?![@\w])Negative lookahead, assert not @ or word char to the right
Or match a non word boundary \B before the @ instead of a lookbehind.
\B@([A-Za-z] [\w-] )(?![@\w])
