I want to use REGEX to parse my data into 3 columns
Film data:
Marvel Comics Presents (1988) #125
Spider-Man Legends Vol. II: Todd Mcfarlane Book I (Trade Paperback)
Spider-Man Legends Vol. II: Todd Mcfarlane Book I
Spider-Man Legends Vol. II: Todd Mcfarlane Book I (1998)
Marvel Comics Presents #125
Expected output: enter image description here
I can see how to group it, but can't seem to REGEX it: enter image description here
I built this expression: (.*)\((\d{4})\)(.*)
I want to essentially use the ? quantifier to say the following:
(.*)\((\d{4})\)**?**(.*)
sort of like saying this group may or may not be there?
Nevertheless, it's not working.
CodePudding user response:
You could use 2 capture groups, where the last 2 are optional:
^(.*?)(?:\((\d{4})\))?\s*(#\d )?$
The pattern matches:
^Start of string(.*?)Capture group 1(?:\((\d{4})\))?Optional non capture group capturing 4 digits in group 2\s*match optional whitespace chars(#\d )?Optional group 3, match#and 1 digits$End of string
See a regex101 demo.
