Vectorizing matrix operations using numpy-CodePudding

I wrote the code below that uses a for loop. I would like to ask if there is a way to vectorize the operation within the second for loop since I intend to work with larger matrices.

import numpy as np

num = 5

A = np.array([[1,2,3,4,5], [4,5,6,4,5], [7,8,9,4,5], [10,11,12,4,5], [13,14,15,4,5]])
sm_factor = np.array([0.1 ,0.1, 0.1, 0.1, 0.1])


d2m = np.zeros((num, num))
d2m[0, 0] = 2
d2m[0, 1] = -5
d2m[0, 2] = 4
d2m[0, 3] = -1

for k in range(1, num-1):
    d2m[k, k-1] = 1
    d2m[k, k] = -2
    d2m[k, k 1] = 1

d2m[num-1, num-4] = -1
d2m[num-1, num-3] = 4
d2m[num-1, num-2] = -5
d2m[num-1, num-1] = 2

x_smf  = 0
for i in range(len(sm_factor)):
    x_smf = x_smf   sm_factor[i] * (d2m @ (A[i, :]).T).T @ (d2m @ (A[i, :]).T)

x_smf
# 324.0

CodePudding user response：

You can avoid loops for both d2m matrix creation and x_smf vector computation using sps.diags for the creation of a sparse tridiagonal matrix that you can cast to array to be able to edit the first and last lines. Your code will look like this (note that the result of diags in line 10 has been cast to a dense ndarray using scipy.sparse.dia_matrix.toarray method):

import numpy as np
import scipy.sparse as sps

# Dense tridiagonal matrix
d2m = sps.diags([1, -2, 1], [-1, 0, 1], shape=(num, num)).toarray() # cast to array
# First line boundary conditions
d2m[0, 0] = 2
d2m[0, 1] = -5
d2m[0, 2] = 4
d2m[0, 3] = -1
# Last line boundary conditions
d2m[num-1, num-4] = -1
d2m[num-1, num-3] = 4
d2m[num-1, num-2] = -5
d2m[num-1, num-1] = 2

The solution proposed by Valdi_Bo enables you to remove the second FOR loop:

x_smf = np.sum(sm_factor * np.square(d2m @ A.T).sum(axis=0))

However, I want to attract your attention on the fact that the x_smf matrix is sparse and storing it as a dense ndarray is bad for both computation time and memory storage. Instead of casting to dense ndarray, I advise you to cast to a sparse matrix format. For example lil_matrix, which is a list of lists sparse matrix format, using tolil() method instead of toarray():

# Sparse tridiagonal matrix
d2m_s = sps.diags([1, -2, 1], [-1, 0, 1], shape=(num, num)).tolil() # cast to lil

Here is a script that compares the three implementations on a bigger case num=4000 (for num=5 all give 324). For this size, I am already seeing benefits of using sparse matrix, here is the whole script (the first lines are a generalisation of the code for num different from 5):

from time import time
import numpy as np
import scipy.sparse as sps

num = 4000

A = np.concatenate([np.arange(1, (num-2)*num 1).reshape(num, num-2), np.repeat([[4, 5]], num, axis=0)], axis=1)
sm_factor = 0.1*np.ones(num)

########## DENSE matrix   FOR loop ##########

d2m = sps.diags([1, -2, 1], [-1, 0, 1], shape=(num, num)).toarray() # cast to array
# First line boundary conditions
d2m[0, 0] = 2
d2m[0, 1] = -5
d2m[0, 2] = 4
d2m[0, 3] = -1
# Last line boundary conditions
d2m[num-1, num-4] = -1
d2m[num-1, num-3] = 4
d2m[num-1, num-2] = -5
d2m[num-1, num-1] = 2

# FOR loop version
t_start = time()
x_smf  = 0
for i in range(len(sm_factor)):
    x_smf = x_smf   sm_factor[i] * (d2m @ (A[i, :]).T).T @ (d2m @ (A[i, :]).T)
print(f'FOR loop version time:           {time()-t_start}s')
print(f'FOR loop version value:          {x_smf}\n')

########## DENSE matrix   VECTORIZED ##########

t_start = time()
x_smf_v = np.sum(sm_factor * np.square(d2m @ A.T).sum(axis=0))
print(f'VECTORIZED version time:         {time()-t_start}s')
print(f'VECTORIZED version value:        {x_smf_v}\n')

########## SPARSE matrix   VECTORIZED ##########

d2m_s = sps.diags([1, -2, 1], [-1, 0, 1], shape=(num, num)).tolil() # cast to lil
# First line boundary conditions
d2m_s[0, 0] = 2
d2m_s[0, 1] = -5
d2m_s[0, 2] = 4
d2m_s[0, 3] = -1
# Last line boundary conditions
d2m_s[num-1, num-4] = -1
d2m_s[num-1, num-3] = 4
d2m_s[num-1, num-2] = -5
d2m_s[num-1, num-1] = 2

t_start = time()
x_smf_s = np.sum(sm_factor * np.square(d2m_s @ A.T).sum(axis=0))
print(f'SPARSE VECTORIZED version time:  {time()-t_start}s')
print(f'SPARSE VECTORIZED version value: {x_smf_s}\n')

Here is what I get when running the code:

FOR loop version time:           25.878241777420044s
FOR loop version value:          3.752317536763356e 17

VECTORIZED version time:         1.0873610973358154s
VECTORIZED version value:        3.752317536763356e 17

SPARSE VECTORIZED version time:  0.37279224395751953s
SPARSE VECTORIZED version value: 3.752317536763356e 17

As you can see the use of a sparse matrix makes you win another factor 3 on computation time and doesn't require you to adapt the code coming afterwards. It is also a good strategy to test the various scipy implementations of sparse matrices (tocsc(), tocsr(), todok() etc.), some may be more adapted to your case.

CodePudding user response：

After some research and printout of intermediate results of your loop, I found the solution:

x_smf = np.sum(sm_factor * np.square(d2m @ A.T).sum(axis=0))

The result is:

324.0

By the way: Creation of dm2 can be shortened to:

d2m = np.zeros((num, num), dtype='int')
d2m[0, :4] = [ 2, -5,  4, -1]
for k in range(1, num-1):
    d2m[k, k-1:k 2] = [ 1, -2,  1]
d2m[-1, -4:] = [-1, 4, -5, 2]