Is numpy optimized to work on array of arrays?-CodePudding

Some answers on stackoverflow suggest to use a ndarray of ndarray, when working with data in which the number of elements per row is not constant (How to make a multidimension numpy array with a varying row size?).

Is numpy optimized to work on a structure like that ?

Here's a simplified example of such a structure:

import numpy as np
x = np.array([1,2,3])
y = np.array([4,5])
data = np.array([x,y],dtype=object)

It's possible to do operations like:

print(data 1)
print(data data)

But some operations would fail like :

print(np.sum(data))

What's happening behind the scenes with this type of structure ?

CodePudding user response：

Like a list, an object dtype array can contain objects of any kind. For example

In [6]: arr = np.array([1,"two",[1,2,3],np.array([4,5,6])], object)
In [7]: arr
Out[7]: array([1, 'two', list([1, 2, 3]), array([4, 5, 6])], dtype=object)

Look what happens when we do addition:

In [8]: arr arr
Out[8]: 
array([2, 'twotwo', list([1, 2, 3, 1, 2, 3]), array([ 8, 10, 12])],
      dtype=object)
In [10]: arr*2
Out[10]: 
array([2, 'twotwo', list([1, 2, 3, 1, 2, 3]), array([ 8, 10, 12])],
      dtype=object)

For list and strings, these operations are defined as 'join/replication'. It's in effect doing [x.__add__(x) for x in arr]. where __add__ is the class specific operation.

np.exp doesn't work because it tries to do [x.exp() for in arr], and almost noone defines an exp method.

In [11]: np.exp(arr)
AttributeError: 'int' object has no attribute 'exp'

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "<ipython-input-11-16c1c90aa297>", line 1, in <module>
    np.exp(arr)
TypeError: loop of ufunc does not support argument 0 of type int which has no callable exp method