Match words that are not apostrophes-CodePudding

I am able to match strings that have apostrophes: ([\w] ['][\w] ). However now I want words that are not apostrophes. For example:

 test's to see if this work's 123abc

Result:

test's
work's

Now I want strings and digits separated by spaces, tabs, newlines, and most of the punctuation marks. I tried this: (?!\w ')[\w] . This gives me the following (same sample test as above):

s
to
see
if
this
s
123abc

I want the output to be this:

to
see
if
this
123abc

CodePudding user response：

You can try this regex:

(?<!['])\b\w \b(?!['])

Click for Demo

Explanation:

(?<![']) - negative lookbehind that matches a position which is not immediately preceeded by a '
\b - matches a word boundary i.e., the position between a word-character and a non-word character
\w - matches 1 or more occurrences of word characters [a-zA-Z0-9_]
\b - matches a word boundary i.e., the position between a word-character and a non-word character
(?![']) - negative lookahead that matches the current position that is not immediately followed by '

CodePudding user response：

Here is an re.findall approach using an alternation:

inp = "test's to see if this work's 123abc"
words = [x for x in re.findall(r"\w '\w |(\w )", inp) if x != '']
print(words)  # ['to', 'see', 'if', 'this', '123abc']

The idea is first to match, but not capture, words with apostrophes, followed by match and capture non apostrophe words.

CodePudding user response：

I got it:

\w(?![\w] ['][\w] )([a-z0-9] )