Numpy dot product of 3D arrays with shapes (X, Y, Z) and (X, Y, 1)-CodePudding

I have 2 numpy 3D arrays: A of shape (X, Y, Z) and B of shape (X, Y, 1). I need to perform dot product of each column of A with the single column of B, obtaining another array C of shape (X, Z).
I managed do achieve the results by allocating C, iterating over the 1st shape (X) and saving the results during the loop.

Here is a sample code:

import numpy as np

np.random.seed(10)

x, y, z = 7, 2048, 10

a = np.random.randint(0, 10, (x, y, z))
b = np.random.randint(0, 10, (x, y, 1))
c = np.zeros((x, z))

for i in range(x):
    c[i] = np.dot(b[i].T, a[i])

print(c)

With the output:

[[40223. 42505. 41040. 40772. 41213. 40311. 41813. 41632. 40578. 40859.]
 [40984. 42119. 41512. 40948. 40725. 40222. 41182. 42255. 41916. 41948.]
 [41824. 41908. 42118. 39690. 40537. 41394. 42446. 41598. 40710. 42171.]
 [41664. 41949. 40847. 39915. 41888. 41565. 40992. 41354. 41227. 41948.]
 [42766. 41490. 41291. 42317. 40691. 41544. 41440. 41111. 42395. 40857.]
 [41714. 40661. 41421. 42129. 42115. 42189. 41941. 41541. 41957. 42574.]
 [41236. 40527. 41599. 40372. 40897. 41287. 41953. 40968. 41700. 42033.]]

For low values of (X, Y, Z) the process is quite fast, but usually I work with large sample sets, which makes the solution poorly optimized.

So, despite the fact the code above is working for me, I believe there is another (better) way to get the same results, maybe using matmul or tensordot, but I couldn't figure out how to properly use those functions.

CodePudding user response：

Easiest way is to do it in two steps, like a manual dot product:

c = (a * b).sum(1)

Or if you wanna get fancy and speed it up a bit:

c = np.einsum('ijk,ijl->ik', a, b)

Output:

array([[40223, 42505, 41040, 40772, 41213, 40311, 41813, 41632, 40578,
        40859],
       [40984, 42119, 41512, 40948, 40725, 40222, 41182, 42255, 41916,
        41948],
       [41824, 41908, 42118, 39690, 40537, 41394, 42446, 41598, 40710,
        42171],
       [41664, 41949, 40847, 39915, 41888, 41565, 40992, 41354, 41227,
        41948],
       [42766, 41490, 41291, 42317, 40691, 41544, 41440, 41111, 42395,
        40857],
       [41714, 40661, 41421, 42129, 42115, 42189, 41941, 41541, 41957,
        42574],
       [41236, 40527, 41599, 40372, 40897, 41287, 41953, 40968, 41700,
        42033]])

CodePudding user response：

@fsl gave this einsum:

np.einsum('ijk,ijl->ik',a,b)

with a bit of transpose, you can place the j, sum-of-products dimension in the standard dot order (last of A, 2nd to the last of B):

np.einsum('ikj,ijl->ik',a.transpose(0,2,1),b)

Which can be used with matmul:

np.matmul(a.transpose(0,2,1),b).squeeze()

the squeeze is needed to remove the trailing size 1 dimension (the l that was 'omitted' in the einsum.

tensordot cannot be used since it will do an "outer" product on the leading dimensions.

In [8]: np.matmul(a.transpose(0,2,1),b).shape
Out[8]: (7, 10, 1)
In [9]: np.dot(a.transpose(0,2,1),b).shape
Out[9]: (7, 10, 7, 1)

In this case the einsum is nearly as good as matmul, possibly because the j dimension is relatively large.

In [10]: timeit np.matmul(a.transpose(0,2,1),b).squeeze()
291 µs ± 632 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [11]: timeit np.einsum('ijk,ijl->ik',a,b)
337 µs ± 77.4 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [12]: timeit np.einsum('ijk,ijl->ik',a,b,optimize=True)