I want to find all dates in a text if there is no word Effective before the date. For example, I have the following line:
FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022
My regex should return ['January , 2022', 'January 5, 2022']
How can I do this in Python?
My attempt:
>>> import re
>>> rule = '((?<!Effective\ )([A-Za-z]{3,9}\ *\d{1,2}\ *,\ *\d{4}))'
>>> text = 'FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022'
>>> re.findall(rule, text)
[('anuary 1, 2022', 'anuary 1, 2022'), ('January 7, 2022', 'January 7, 2022'), ('January 5, 2022', 'January 5, 2022')]
But it doesn't work.
CodePudding user response:
You can use
\b(?<!Effective\s)[A-Za-z]{3,9}\s*\d{1,2}\s*,\s*\d{4}(?!\d)
See the regex demo. Details:
\b- a word boundary(?<!Effective\s)- a negative lookbehind that fails the match if there isEffectivea whitespace char immediately to the left of the current location[A-Za-z]{3,9}- three to nine ASCII letters\s*- zero or more whitespaces\d{1,2}- one or two digits\s*,\s*- a comma enclosed with zero or more whitespaces\d{4}- four digits(?!\d)- a negative lookahead that fails the match if there is a digit immediately on the right.
