I am creating a regex to parse a string of time zones. The output must be reading input in the following form:
0930
0930 10930-1
<0930
(>0930) (the brackets are just to avoid stack reading this as '<>')
(<0920 1)
(>0920 1)
0920-1240 1
1200-1-1430
1200-1-1400 1
0920-1240 <<<<<<<<<<<<<<<<<<<<<<<<<ISSUE HERE
The regex cannot differentiate between hhmm-1, and hhmm-hhmm. It will read '0900-1200' as '0900-1'.
I have attempted many variateions of the regex, including:
r'([<>])?([0-9]{2})([0-9]{2})([ -]?)([0-1]?)|([0-9]{2})([0-9]{2})'
r'([<>])?([0-9]{2})([0-9]{2})([ -])?([0-1]?)(([0-1]?{4})()'
r'([<>])?([0-9]{2})([0-9]{2})([ -])?([0-1]?)(?([0-1]?)()'
Currently just considering using 2 different ones! One to test for case of hyphenated time string, the other for the rest,which work for me. I would like the output in a list of tuples, like
[('', '09', '30', '-', '','12','30', '-', '1'),
('', '09', '30', '-', '1','','', '', ''),
('>', '09', '30', '-', '1','','', '', '').....]
CodePudding user response:
You can use
([<>])?([0-9]{2})([0-9]{2})(?:([ -])([01])(?!\d{3}\b))?(?:([ -])([0-9]{2})([0-9]{2})(?:([ -])([01])(?!\d{3}\b))?)?
See the regex demo. Details:
([<>])?- Group 1 (optional):<or>([0-9]{2})- Group 2: two digits([0-9]{2})- Group 3: two digits(?:([ -])([01])(?!\d{3}\b))?- an optional group matching a sequence of:([ -])- Group 4:or-([01])(?!\d{3}\b)- Group 5:1or0that are not followed with 3 more digits followed with a word boundary
(?:- start of a non-capturing group:([ -])- Group 6:or-([0-9]{2})- Group 7: two digits([0-9]{2})- Group 8: two digits(?:([ -])([01])(?!\d{3}\b))?- Optional sequence ofor-captured in Group 9 and then1or0(captured in Group 10) that are not followed with 3 more digits followed with a word boundary
)?- end of non-capturing group, repeat 1 or 0 times.
