I'm trying to match type annotations like int | str, and use regex substitution to replace them with a string Union[int, str].
Desired substitutions (before and after):
str|int|bool->Union[str,int,bool]Optional[int|tuple[str|int]]->Optional[Union[int,tuple[Union[str,int]]]]dict[str | int, list[B | C | Optional[D]]]->dict[Union[str,int], list[Union[B,C,Optional[D]]]]
The regular expression I've come up with so far is as follows:
r"\w*(?:\[|,|^)[\t ]*((?'type'[a-zA-Z0-9_.\[\]] )(?:[\t ]*\|[\t ]*(?&type)) )(?:\]|,|$)"
You can try it out here on Regex Demo. It's not really working how I'd want it to. The problems I've noted so far:
It doesn't seem to handle nested Union conditions so far. For example,
int | tuple[str|int] | boolseems to result in one match, rather than two matches (including the inner Union condition).The regex seems to consume unnecessary
]at the end.Probably the most important one, but I noticed the regex subroutines don't seem to be supported by the
remodule in Python. Here is where I got the idea to use that from.
Additional Info
This is mainly to support the PEP 604 syntax for Python 3.7 , which requires annotatations to be forward-declared (e.g. declared as strings) to be supported, as otherwise builtin types don't support the | operator.
Here's a sample code that I came up with:
from __future__ import annotations
import datetime
from decimal import Decimal
from typing import Optional
class A:
field_1: str|int|bool
field_2: int | tuple[str|int] | bool
field_3: Decimal|datetime.date|str
field_4: str|Optional[int]
field_5: Optional[int|str]
field_6: dict[str | int, list[B | C | Optional[D]]]
class B: ...
class C: ...
class D: ...
For Python versions earlier than 3.10, I use a __future__ import to avoid the error below:
TypeError: unsupported operand type(s) for |: 'type' and 'type'
This essentially converts all annotations to strings, as below:
>>> A.__annotations__
{'field_1': 'str | int | bool', 'field_2': 'int | tuple[str | int] | bool', 'field_3': 'Decimal | datetime.date | str', 'field_4': 'str | Optional[int]', 'field_5': 'Optional[int | str]', 'field_6': 'dict[str | int, list[B | C | Optional[D]]]'}
But in code (say in another module), I want to evaluate the annotations in A. This works in Python 3.10, but fails in Python 3.7 even though the __future__ import supports forward declared annotations.
>>> from typing import get_type_hints
>>> hints = get_type_hints(A)
Traceback (most recent call last):
eval(self.__forward_code__, globalns, localns),
File "<string>", line 1, in <module>
TypeError: unsupported operand type(s) for |: 'type' and 'type'
It seems the best approach to make this work, is to replace all occurrences of int | str (for example) with Union[int, str], and then with typing.Union included in the additional localns used to evaluate the annotations, it should then be possible to evaluate PEP 604- style annotations for Python 3.7 .
CodePudding user response:
You can install the PyPi regex module (as re does not support recursion) and use
import regex
text = "str|int|bool\nOptional[int|tuple[str|int]]\ndict[str | int, list[B | C | Optional[D]]]"
rx = r"(\w \[)(\w (\[(?:[^][|] |(?3))*])?(?:\s*\|\s*\w (\[(?:[^][|] |(?4))*])?) )]"
n = 1
res = text
while n != 0:
res, n = regex.subn(rx, lambda x: "{}Union[{}]]".format(x.group(1), regex.sub(r'\s*\|\s*', ',', x.group(2))), res)
print( regex.sub(r'\w (?:\s*\|\s*\w ) ', lambda z: "Union[{}]".format(regex.sub(r'\s*\|\s*', ',', z.group())), res) )
Output:
Union[str,int,bool]
Optional[Union[int,tuple[Union[str,int]]]]
dict[Union[str,int], list[Union[B,C,Optional[D]]]]
See the Python demo.
The first regex finds all kinds of WORD[...] that contain pipe chars and other WORDs or WORD[...] with no pipe chars inside them.
The \w (?:\s*\|\s*\w ) regex matches 2 or more words that are separated with pipes and optional spaces.
The first pattern details:
(\w \[)- Group 1 (this will be kept as is at the beginning of the replacement): one or more word chars and then a[char(\w (\[(?:[^][|] |(?3))*])?(?:\s*\|\s*\w (\[(?:[^][|] |(?4))*])?) )- Group 2 (it will be put insideUnion[...]with all\s*\|\s*pattern replaced with,):\w- one or more word chars(\[(?:[^][|] |(?3))*])?- an optional Group 3 that matches a[char, followed with zero or more occurrences of one or more[or]chars or whole Group 3 recursed (hence, it matches nested parentheses) and then a]char(?:\s*\|\s*\w (\[(?:[^][|] |(?4))*])?)- one or more occurrences (so the match contains at least one pipe char to replace with,) of:\s*\|\s*- a pipe char enclosed with zero or more whitespaces\w- one or more word chars(\[(?:[^][|] |(?4))*])?- an optional Group 4 (matches the same thing as Group 3, note the(?4)subroutine repeats Group 4 pattern)
]- a]char.
