Home > Enterprise >  Match words that are not apostrophes
Match words that are not apostrophes

Time:01-23

I am able to match strings that have apostrophes: ([\w] ['][\w] ). However now I want words that are not apostrophes. For example:

 test's to see if this work's 123abc

Result:

test's
work's

Now I want strings and digits separated by spaces, tabs, newlines, and most of the punctuation marks. I tried this: (?!\w ')[\w] . This gives me the following (same sample test as above):

s
to
see
if
this
s
123abc

I want the output to be this:

to
see
if
this
123abc

CodePudding user response:

You can try this regex:

(?<!['])\b\w \b(?!['])

Click for Demo


Explanation:

  • (?<![']) - negative lookbehind that matches a position which is not immediately preceeded by a '
  • \b - matches a word boundary i.e., the position between a word-character and a non-word character
  • \w - matches 1 or more occurrences of word characters [a-zA-Z0-9_]
  • \b - matches a word boundary i.e., the position between a word-character and a non-word character
  • (?![']) - negative lookahead that matches the current position that is not immediately followed by '

CodePudding user response:

Here is an re.findall approach using an alternation:

inp = "test's to see if this work's 123abc"
words = [x for x in re.findall(r"\w '\w |(\w )", inp) if x != '']
print(words)  # ['to', 'see', 'if', 'this', '123abc']

The idea is first to match, but not capture, words with apostrophes, followed by match and capture non apostrophe words.

CodePudding user response:

I got it:

\w(?![\w] ['][\w] )([a-z0-9] )

  •  Tags:  
  • Related