Consider the simplest possible function
@numba.jit
def foo(s1):
return s1
Now constructing an array of np.bytes_ objects
> a = np.array(['abc']*5, dtype='S5')
> a
array([b'abc', b'abc', b'abc', b'abc', b'abc'], dtype='|S5')
Why does calling foo with the vector work:
> foo(a)
array([b'abc', b'abc', b'abc', b'abc', b'abc'], dtype='|S5')
But calling foo with a single element raises an exception
> foo(a[0])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_8124/2559272744.py in <module>
----> 1 foo(a[0])
TypeError: bad argument type for built-in operation
(This is running numba 0.54.1 from conda-forge on Windows with Python 3.9.7 and numpy 1.20.3)
CodePudding user response:
Neither bytes nor np.bytes_ types are listed in the set of types supported by numba as of the latest release. The closest things it supports would be:
- Character sequences (read:
str) (though it specifically says "no operations are available on them", so this is pretty useless); your function would work if you calledfoo(a[0].decode())to make it text (but only because it's a pretty useless function) - Actual
numpyarrays; the cost to view thebytes/np.bytes_as annp.arrayis pretty low, so you could just do:foo(np.frombuffer(a[0], np.uint8))and produce something that is more programmatically useful and represents the same data.
CodePudding user response:
The bytes type is barely supported like the str type. They are very inefficiently supported and the support is minimalist. Moreover, there are some opened related bugs (like this one. Furthermore, AFAIK, there is no plan to work on this any time soon.
From my understanding, a[0] returns a numpy.bytes_-typed object which is not completely compatible with bytes (at least for Numba). Compiling the function with numpy.bytes_ appear to cause a bug that makes Numba being confused between numpy.bytes_ and bytes (Numba try to use a compiled function with the wrong type).
Indeed, the following code works:
@numba.jit
def foo(s1):
return s1
foo(b'test') # Works
foo(bytes(a[0])) # Works
The following code fails:
@numba.jit
def foo(s1):
return s1
foo(a[0]) # Fail and cause a bug
foo(bytes(a[0])) # Now fail (do not recompile the function properly)
foo(b'test') # Also fail (do not recompile the function properly)
Note that the bytes type is only supported in read-only mode.
