Matching a Comma Separated List within Brackets-CodePudding

I have the following string that would be part of a file name. [Cast1, Cast2, Cast 3], this string is comma delimited. It would be at the end of a film title and be preceded with either a - or ~

The filename would look like this

(Studio) - Title (Year) ~ [Cast1, Cast2, Cast 3] the section in bold could be optional

I need a REGEX to get the following, I know this can be done with string splitting but I need it in REGEX

Cast1
Cast2
Cast 3

I would like this to be in a named group, so far I have ((?P<CAST>([^,] ))) But it includes the opening bracket and closing bracket.

On top of this

CodePudding user response：

If I understand what you are looking for, try:

[-~]\s*\[(?P<CAST>[^\]]*)\]

See RegEx Demo

[-~] Matches '-' or '~'.
\s* Matches zero or more whitespace characters.
\[ Matches '['.
(?P<CAST>[^\]]*) Matches 0 or more characters that are not ']' and captures them in named capture group CAST.
\] Matches ']'.

So the above will capture whatever is between the '[' and ']' characters following a '-' or '~' whether those characters contain commas or not. You cannot have 3 capture groups identically named CAST. If you want the individual components of the cast, you will have to do it with string splitting:

import re

s = '(Studio) - Title (Year) ~ [Cast1, Cast2, Cast 3]'
m = re.search(r'[-~]\s*\[(?P<CAST>[^\]]*)\]', s)
if m:
    cast = m.group('CAST')
    print re.split(r',\s*', cast)

Prints:

['Cast1', 'Cast2', 'Cast 3']

If you were running Python 3, you could install the regex module from the PyPi repository, which has far more capabilities then the builtin re module, and then you could execute:

import regex

s = '(Studio) - Title (Year) ~ [Cast1, Cast2, Cast 3]'
for m in regex.finditer(r'(?:[-~]\s*\[|\G(?!\A))\K\s*(?P<CAST>[^,\]]*)(?:[,\]])', s):
    print(m['CAST'])

Prints:

Cast1
Cast2
Cast 3

But what does that buy you?