I am able to match strings that have apostrophes: ([\w] ['][\w] ). However now I want words that are not apostrophes. For example:
test's to see if this work's 123abc
Result:
test's
work's
Now I want strings and digits separated by spaces, tabs, newlines, and most of the punctuation marks. I tried this: (?!\w ')[\w] . This gives me the following (same sample test as above):
s
to
see
if
this
s
123abc
I want the output to be this:
to
see
if
this
123abc
CodePudding user response:
You can try this regex:
(?<!['])\b\w \b(?!['])
Explanation:
(?<!['])- negative lookbehind that matches a position which is not immediately preceeded by a'\b- matches a word boundary i.e., the position between a word-character and a non-word character\w- matches 1 or more occurrences of word characters[a-zA-Z0-9_]\b- matches a word boundary i.e., the position between a word-character and a non-word character(?!['])- negative lookahead that matches the current position that is not immediately followed by'
CodePudding user response:
Here is an re.findall approach using an alternation:
inp = "test's to see if this work's 123abc"
words = [x for x in re.findall(r"\w '\w |(\w )", inp) if x != '']
print(words) # ['to', 'see', 'if', 'this', '123abc']
The idea is first to match, but not capture, words with apostrophes, followed by match and capture non apostrophe words.
CodePudding user response:
I got it:
\w(?![\w] ['][\w] )([a-z0-9] )
