Using bytes regular expression works fine as follows:
In [48]: regexp_1 = re.compile(b"\xab.{3}")
In [49]: regexp_1.fullmatch(b"\xab\x66\x77\x88")
Out[49]: <re.Match object; span=(0, 4), match=b'\xabfw\x88'> # <----- good !
When I try formatting the bytes sequence according to this post I fail:
In [50]: byte = b"\xab"
In [51]: regexp_2 = re.compile(f"{byte}.{3}".encode())
In [52]: regexp_2.fullmatch(b"\xab\x66\x77\x88")
In [53]: # nothing found ... why ?
CodePudding user response:
This happens because f-string converts the given object to string, and when the bytes object is converted to string, it doesn't look like what you'd expect:
>>> str(byte)
"b'\\xab'"
so when you put it through f-string as you did, it gets ugly, and it stays that way when it's encoded again!
>>> f"{byte}.{3}"
"b'\\xab'.3"
>>> f"{byte}.{3}".encode()
b"b'\\xab'.3"
Not to mention {3} gets parsed as 3. to prevent that you can use double brackets ({{3}}) instead, but that's not the point of this problem.
I recommend you to concate strings instead.
regexp = re.compile(byte b'.{3}')
# <re.Match object; span=(0, 4), match=b'\xabfw\x88'>
regexp.fullmatch(b"\xab\x66\x77\x88")
