efficient way of adding up subarrays to create a big array in python-CodePudding

Let N be bigger than k, M be the N by N array with 0 entries and L be a list of lists of k by k arrays (edit: L is essentially a matrix where each entry is a matrix). I want to iterate through the pairs (i, j) with 1 <= i,j <= N and add up a k by k matrix to M[i:i k, j:j k], as in the following Python code:

for i in range(N - k   1):
    for j in range(N - k   1):
        M[i:i k, j:j k]  = L[i][j]

What would be a Pythonic & computationally efficient way of implementing this?

Edit: example N=3, k=2,

L[0,0] = [1,2]
         [3,4]    

L[0,1] = [5,6]
         [7,8]

L[1,0] = [9,10]
         [11,12]      

L[1,1] =[13,14]    
        [15,16].

initially, M is

    [ 0, 0, 0]
    [ 0, 0, 0]
    [ 0, 0, 0]

at i=j=0, M becomes

[ 1, 2, 0]   
[ 3, 4, 0]    
[ 0, 0, 0]

at i=0, j=1, M becomes

      [0,5,6] 
 M  = [0,7,8]
      [0,0,0]

CodePudding user response：

So with your example:

In [30]: N, k = 3,2
In [31]: nk = N-k 1
In [32]: L = np.arange(1,(nk*nk*k*k) 1).reshape(nk,nk,k,k)
In [40]: M = np.zeros((N,N),int)
    ...: for i in range(nk):
    ...:     for j in range(nk):
    ...:         M[i:i k, j:j k]  = L[i,j]
    ...: 
In [41]: M
Out[41]: 
array([[ 1,  7,  6],
       [12, 34, 22],
       [11, 27, 16]])

I can recreate this with sliding windows:

In [74]: M1 = np.zeros((N,N),int)
In [75]: M2 = np.lib.stride_tricks.sliding_window_view(M1,(2,2), writeable=True
    ...: )
In [76]: for i in range(nk):
    ...:     for j in range(nk):
    ...:         M2[i, j]  = L[i,j]
    ...: 
In [77]: M2
Out[77]: 
array([[[[ 1,  7],
         [12, 34]],

        [[ 7,  6],
         [34, 22]]],


       [[[12, 34],
         [11, 27]],

        [[34, 22],
         [27, 16]]]])
In [78]: M1
Out[78]: 
array([[ 1,  7,  6],
       [12, 34, 22],
       [11, 27, 16]])

This has the same sort of iteration. The next step would be to use np.add.at to perform the iterative addition.

< more work to come >

I can't do the "obvious" = addition because of the buffering that add.at explains:

In [79]: M1 = np.zeros((N,N),int)
In [80]: M2 = np.lib.stride_tricks.sliding_window_view(M1,(2,2), writeable=True
    ...: )
In [81]: M2  = L
In [82]: M1
Out[82]: 
array([[ 1,  5,  6],
       [ 9, 13, 14],
       [11, 15, 16]])

With @mathfux suggestion:

In [83]: x, y, z, t = np.indices(L.shape)
In [84]: M1 = np.zeros((N,N),int)
In [85]: M2 = np.lib.stride_tricks.sliding_window_view(M1,(2,2), writeable=True
    ...: )
In [86]: np.add.at(M2, (x,y,z,t), L)
In [87]: M1
Out[87]: 
array([[ 1,  7,  6],
       [12, 34, 22],
       [11, 27, 16]])

Or without explicitly naming the 4 indices arrays (doesn't really matter since L will always be 4d, regardless of N and k).

In [100]: np.add.at(M2, tuple(np.indices(L.shape)), L)
In [101]: M1
Out[101]: 
array([[ 1,  7,  6],
       [12, 34, 22],
       [11, 27, 16]])

The add.at docs mention slices, so may be possible to use it with M directly, rather than via the sliding_windows.

CodePudding user response：

There is no need to pass N and k explicitly because a shape of resulting array is well defined once you know L. I'll just refactor @hpaulj solution in order to get rid of these excessive values:

def kernel_array_sum(kernel_array, dtype=int):
    N, k = np.array(kernel_array.shape[:2]), np.array(kernel_array.shape[2:])
    M = np.zeros(N   k - 1, dtype=dtype)
    A = np.lib.stride_tricks.sliding_window_view(M, k)
    x, y, z, t = np.indices(A.shape)
    np.add.at(A, (x,y,z,t), kernel_array)
    return M

L = np.array([[[[ 1, 2], [ 3, 4]], [[ 5, 6], [ 7, 8]]], 
              [[[ 9,10], [11,12]], [[13,14], [15,16]]],
              [[[17,18], [19,20]], [[21,22], [23,24]]]])

>>> kernel_array_sum(L)
array([[ 1,  7,  6],
       [12, 34, 22],
       [28, 66, 38],
       [19, 43, 24]])

Now this was rather educational. If you want to it in a more performant way numba is a better option:

from numba import njit

@njit
def _nkas(M, L):
    for i in range(L.shape[0]):
        for j in range(L.shape[1]):
            M[i:i k[0], j:j k[1]]  = L[i][j]
            
def numba_kernel_array_sum(kernel_array):
    N, k = np.array(kernel_array.shape[:2]), np.array(kernel_array.shape[2:])
    M = np.zeros(N   k - 1, dtype=int)
    _nkas(M, kernel_array)
    return 
>>> numba_kernel_array_sum(L)
array([[ 1,  7,  6],
       [12, 34, 22],
       [28, 66, 38],
       [19, 43, 24]])

Note that parallel=True option is not available since items of M are reassigned multiple times which is not possible in parallel. Although it's still faster near 4x times on my computer than numpy solution.