I have the following string that would be part of a file name. [Cast1, Cast2, Cast 3], this string is comma delimited. It would be at the end of a film title and be preceded with either a - or ~
The filename would look like this
(Studio) - Title (Year) ~ [Cast1, Cast2, Cast 3] the section in bold could be optional
I need a REGEX to get the following, I know this can be done with string splitting but I need it in REGEX
- Cast1
- Cast2
- Cast 3
I would like this to be in a named group, so far I have ((?P<CAST>([^,] )))
But it includes the opening bracket and closing bracket.
On top of this
CodePudding user response:
If I understand what you are looking for, try:
[-~]\s*\[(?P<CAST>[^\]]*)\]
[-~]Matches '-' or '~'.\s*Matches zero or more whitespace characters.\[Matches '['.(?P<CAST>[^\]]*)Matches 0 or more characters that are not ']' and captures them in named capture group CAST.\]Matches ']'.
So the above will capture whatever is between the '[' and ']' characters following a '-' or '~' whether those characters contain commas or not. You cannot have 3 capture groups identically named CAST. If you want the individual components of the cast, you will have to do it with string splitting:
import re
s = '(Studio) - Title (Year) ~ [Cast1, Cast2, Cast 3]'
m = re.search(r'[-~]\s*\[(?P<CAST>[^\]]*)\]', s)
if m:
cast = m.group('CAST')
print re.split(r',\s*', cast)
Prints:
['Cast1', 'Cast2', 'Cast 3']
If you were running Python 3, you could install the regex module from the PyPi repository, which has far more capabilities then the builtin re module, and then you could execute:
import regex
s = '(Studio) - Title (Year) ~ [Cast1, Cast2, Cast 3]'
for m in regex.finditer(r'(?:[-~]\s*\[|\G(?!\A))\K\s*(?P<CAST>[^,\]]*)(?:[,\]])', s):
print(m['CAST'])
Prints:
Cast1
Cast2
Cast 3
But what does that buy you?
