I would like to split a string at spaces (and colons), except inside curly brackets and rounded brackets. Similar questions have been asked, but the answers fail with nested brackets.
Here is an example of a string to split:
p1: I/out p2: (('mean', 5), 0.0, ('std', 2)) p3: 7 p4: {'name': 'check', 'value': 80.0}
The actual goal is to obtain a list of keys (p1, p2, p3 and p4) along with their values. When I try to split the string at spaces and colons, I can avoid splitting at spaces and colons inside the curly brackets. But I cannot avoid the splitting at some spaces inside the rounded brackets because of the nested brackets.
The closest I got is
[\s:] (?=[^\{\(\)\}]*(?:[\{\(]|$))
which is fine except that it splits between (('mean', 5), and 0.0.
CodePudding user response:
You can use the following PCRE/Python PyPi regex compliant pattern:
(?:(\((?:[^()] |(?1))*\))|(\{(?:[^{}] |(?2))*})|[^\s:])
See the regex demo.
It matches
(?:- start of a container non-capturing group:(\((?:[^()] |(?1))*\))- Group 1: a substring between two nested round brackets|- or(\{(?:[^{}] |(?2))*})- Group 2: a substring between two nested braces|- or[^\s:]- a char other than whitespace and colon
)- one or more occurrences.
See the Python demo:
import regex
text = "p1: I/out p2: (('mean', 5), 0.0, ('std', 2)) p3: 7 p4: {'name': 'check', 'value': 80.0}"
pattern = r"(?:(\((?:[^()] |(?1))*\))|(\{(?:[^{}] |(?2))*})|[^\s:]) "
print( [x.group() for x in regex.finditer(pattern, text)] )
Output:
['p1', 'I/out', 'p2', "(('mean', 5), 0.0, ('std', 2))", 'p3', '7', 'p4', "{'name': 'check', 'value': 80.0}"]
