I have two numpy arrays 'a' and 'b'.
'a' is shape [30000,2] and contains pairs of x,y coordinates. 'b' is of shape [10,000,000,3] and contains x,y,z coordinates.
x,y coordinate pairs from 'a' will always occur exactly once (ie uniquely) in 'b'. I want to efficiently extract the corresponding z coordinates from 'b'.
Here's a simple example...
a = np.array([[1,2], [3,4], [5,6], [8,9]]).T
b = np.array([[1,2,11], [1,3,12], [3,4,13], [4,5,14],[5,6,15], [6,7,16], [7,8,17], [8,9,18]]).T
Would return row indices of [0,2,4,7] such that z = [11, 13, 15, 18]
Obviously this can be achieved with 2 for loops (YUCK!!!)
I'm sure this is a simple problem but it has me stumped...
What's the most efficient way to achieve this? (especially for larger datasets)
CodePudding user response:
You can transform your 2D array into a 1D view (see this answer), then use numpy.isin:
def view1D(a, b):
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
A,B = view1D(a.T, b[:2].T)
b.T[np.isin(B, A)][:,2]
# array([11, 13, 15, 18])
CodePudding user response:
Another option, if you do not want to flatten your arrays, is to make them the same size and compare them element-wise:
a, b = a.T, b.T
tile_a = np.tile(a, b[: , :2].shape[0]).reshape(a.shape[0] * b[: , :2].shape[0], a.shape[1])
indices = np.argwhere((tile_a == np.concatenate([b[:, :2]]*a.shape[0])).all(axis=1))
indices[indices > 0] -= 1
print(np.squeeze(b[indices // a.shape[0]], axis=1)[:, 2])
#[11 13 15 18]
