just trying to figure out how to plit a string by comma except when in bracket AND except when directly before and/or after the comma is a dash. I have already found some good solutions for how to deal with the bracket problem but I do not have any clue how to extend this to my problem.
Here is an example:
example_string = 'A-la-carte-Küche, Garnieren (Speisen, Getränke), Kosten-, Leistungsrechnung, Berufsausbildung, -fortbildung'
aim = ['A-la-carte-Küche', 'Garnieren (Speisen, Getränke)', 'Kosten-, Leistungsrechnung', 'Berufsausbildung, -fortbildung']
So far, I have managed to do the following:
>>> re.split(r',\s*(?![^()]*\))', example_string)
>>> out: ['A-la-carte-Küche', 'Garnieren (Speisen, Getränke)', 'Kosten-', 'Leistungsrechnung', 'Berufsausbildung', '-fortbildung']
Note the difference between aim and out for the terms 'Kosten-, Leistungsrechnung' and 'Berufsausbildung, -fortbildung'. Would be glad if someone could help me out such that the output looks like aim.
Thanks in advance!
Alex
CodePudding user response:
If you can make use of the python regex module, you could do:
\([^()]*\)(*SKIP)(*F)|(?<!-)\s*,\s*(?!,)
The pattern matches:
\([^()]*\)Match from an opening till closing parenthesis(*SKIP)(*F)Skip the match|Or(?<!-)\s*,\s*(?!,)Match a comma between optional whitespace chars to split on
import regex
example_string = 'A-la-carte-Küche, Garnieren (Speisen, Getränke), Kosten-, Leistungsrechnung, Berufsausbildung, -fortbildung'
print(regex.split(r"\([^()]*\)(*SKIP)(*F)|(?<!-)\s*,\s*(?!,)", example_string))
Output
['A-la-carte-Küche', ' Garnieren (Speisen, Getränke)', ' Kosten-, Leistungsrechnung', ' Berufsausbildung', ' -fortbildung']
CodePudding user response:
You can use
re.split(r'(?<!-),(?!\s*-)\s*(?![^()]*\))', example_string)
See the Python demo. Details:
(?<!-)- a negative lookbehind that fails the match if there is a-char immediately to the left of the current location,- a comma(?!\s*-)- a negative lookahead that fails the match if there is a-char immediately to the right of the current location\s*- zero or more whitespaces(?![^()]*\))- a negative lookahead that fails the match if there are zero or more chars other than)and(and then a)char immediately to the right of the current location.
See the regex demo, too.
