I want to split strings like:
(so) what (are you trying to say)
what (do you mean)
Into lists like:
[(so), what, (are you trying to say)]
[what, (do you mean)]
The code that I tried is below. In the site regexr, the regex expression match the parts that I want but gives a warning, so... I'm not a expert in regex, I don't know what I'm doing wrong.
import re
string = "(so) what (are you trying to say)?"
rx = re.compile(r"((\([\w \w]*\)|[\w]*))")
print(re.split(rx, string ))
CodePudding user response:
Using [\w \w]* is the same as [\w ]* and also matches an empty string.
Instead of using split, you can use re.findall without any capture groups and write the pattern like:
\(\w (?:[^\S\n] \w )*\)|\w
\(Match(\wMatch 1 word chars(?:[^\S\n] \w )*Optionally repeat matching spaces and 1 word chars
\)Match)|Or\wMatch 1 word chars
import re
string = "(so) what (are you trying to say)? what (do you mean)"
rx = re.compile(r"\(\w (?:[^\S\n] \w )*\)|\w ")
print(re.findall(rx, string))
Output
['(so)', 'what', '(are you trying to say)', 'what', '(do you mean)']
CodePudding user response:
For your two examples you can write:
re.split(r'(?<=\)) | (?=\()', str)
Python regex<¯\(ツ)/¯>Python code
This does not work, however, for string defined in the OP's code, which contains a question mark, which is contrary to the statement of the question in terms of the two examples.
The regular expression can be broken down as follows.
(?<=\)) # positive lookbehind asserts that location in the
# string is preceded by ')'
[ ] # match one or more spaces
| # or
[ ] # match one or more spaces
(?=\() # positive lookahead asserts that location in the
# string is followed by '('
In the above I've put each of two space characters in a character class merely to make it visible.
