I'm trying to read the raw contents of a binary file, so they can be manipulated in memory. As far as I understand, bytes() objects are immutable, while bytearray() objects are mutable, so I read the file into a bytearray and then try to modify the latter:
raw_data = bytearray()
try:
with open(input_file, "rb") as f:
raw_data = f.read()
except IOError:
print('Error opening', input_file)
raw_data[0] = 55 # attempt to modify the first byte
However this last line results in a TypeError: 'bytes' object does not support item assignment.
Wait... what 'bytes' object?
Let's look into the actual data types reported by Python, before and after the array is populated:
raw_data = bytearray()
print('Before:', type(raw_data))
try:
with open(input_file, "rb") as f:
raw_data = f.read()
except IOError:
print('Error opening', input_file)
print('After: ', type(raw_data))
Output:
Before: <class 'bytearray'>
After: <class 'bytes'>
So what's going on here? Why is the type modified, and can I prevent it?
I can always create another bytearray object from the contents of raw_data, but it'd be nice if I could save memory and just modify the original in place.
CodePudding user response:
Why is the type modified? Look at the following:
>>> x = 12
>>> type(x)
<class 'int'>
>>> x = 7.0
>>> type(x)
<class 'float'>
Sure, I assigned a value of 12 to x and as a result x had type int. But then I assigned a new value of 7.0 to x and that changed the type of value that x had. This is fundamental Python dynamic typing being demonstrated.
So it doesn't matter that you initially assigned a bytearray instance to raw_data. What counts is the last assignment to raw_data, which was:
raw_data = f.read()
And the call to f.read() returns class bytes.
The way you get around this is by pre-allocating a bytearray with the correct size and using readinto:
with open(input_file, mode="rb") as f:
# Seek to end of file and return offset from beginning:
file_size = f.seek(0, 2)
# Seek back to beginning:
f.seek(0, 0)
# Pre-alllocate bytearray:
raw_data = bytearray(file_size)
f.readinto(raw_data)
