Row indices of unique numpy array (not element) from a larger numpy array-CodePudding

I have two numpy arrays 'a' and 'b'.

'a' is shape [30000,2] and contains pairs of x,y coordinates. 'b' is of shape [10,000,000,3] and contains x,y,z coordinates.

x,y coordinate pairs from 'a' will always occur exactly once (ie uniquely) in 'b'. I want to efficiently extract the corresponding z coordinates from 'b'.

Here's a simple example...

a = np.array([[1,2], [3,4], [5,6], [8,9]]).T
b = np.array([[1,2,11], [1,3,12], [3,4,13], [4,5,14],[5,6,15], [6,7,16], [7,8,17], [8,9,18]]).T

Would return row indices of [0,2,4,7] such that z = [11, 13, 15, 18]

Obviously this can be achieved with 2 for loops (YUCK!!!)

I'm sure this is a simple problem but it has me stumped...

What's the most efficient way to achieve this? (especially for larger datasets)

CodePudding user response：

You can transform your 2D array into a 1D view (see this answer), then use numpy.isin:

def view1D(a, b):
    a = np.ascontiguousarray(a)
    b = np.ascontiguousarray(b)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(void_dt).ravel(),  b.view(void_dt).ravel()

A,B = view1D(a.T, b[:2].T)

b.T[np.isin(B, A)][:,2]
# array([11, 13, 15, 18])

CodePudding user response：

Another option, if you do not want to flatten your arrays, is to make them the same size and compare them element-wise:

a, b = a.T, b.T
tile_a = np.tile(a, b[: , :2].shape[0]).reshape(a.shape[0] * b[: , :2].shape[0], a.shape[1])
indices = np.argwhere((tile_a == np.concatenate([b[:, :2]]*a.shape[0])).all(axis=1))
indices[indices > 0] -= 1 

print(np.squeeze(b[indices // a.shape[0]], axis=1)[:, 2])
#[11 13 15 18]