I'm currently reading reviewing code. I have a double loop to filter a 2D array according to a 1D array. Here is the code:
import numpy as np
size_a = 500
size_b = 2000
a = np.random.rand(size_a)
c = np.random.rand(size_a*size_b).reshape((size_a, size_b))
d = np.random.rand(size_b)
# double loop for filtering c according to d
for i in range(size_b):
for j in range(size_a):
if a[j] <= d[i]:
c[j,i] = 0
# first improvement
for i in range(size_b):
c[a <= d[i], i] = 0
Can we do better ? In terms of speed of execution.
CodePudding user response:
Let's describe what you want to do in words:
- You have a matrix
cof shape(size_a, size_b). - You have a vector
awith one element per row ofc - You have a vector
dwith one element per column ofc
In those locations where a[i] <= d[j], you want to set c to zero.
Let's say we have:
size_a = 3
size_b = 5
a = np.array([23, 55, 37])
c = np.array([[56, 37, 50, 49, 57],
[81, 50, 98, 11, 9],
[52, 47, 23, 64, 20]])
d = np.array([27, 16, 74, 95, 8])
I obtained these using np.random.randint(0, 100, shape) for each of the arrays, but I've hardcoded them here for ease of reproduction and understanding
You can compare the vectors a and d at every combination of i and j two ways:
- Cast them into matrices of the same size:
a_cast = np.tile(a, (size_b, 1)).T
# array([[23, 23, 23, 23, 23],
# [55, 55, 55, 55, 55],
# [37, 37, 37, 37, 37]])
d_cast = np.tile(d, (size_a, 1))
# array([[27, 16, 74, 95, 8],
# [27, 16, 74, 95, 8],
# [27, 16, 74, 95, 8]])
mask = a_cast <= d_cast
# array([[ True, False, True, True, False],
# [False, False, True, True, False],
# [False, False, True, True, False]])
c[mask] = 0
- Have numpy do the broadcasting for you, by making
aa(size_a, 1)matrix andda(1, size_b)matrix. This is done by taking a slice of each vector, with an extraNoneaxis, which creates an extra axis of length 1 (see In numpy, what does selection by [:,None] do?):
mask = a[:, None] <= d[None, :]
# a[:, None] gives:
# array([[23],
# [55],
# [37]])
# d[None, :] gives:
# array([[27, 16, 74, 95, 8]])
c[mask] = 0
Both these methods result in the same c:
array([[ 0, 37, 0, 0, 57],
[81, 50, 0, 0, 9],
[52, 47, 0, 0, 20]])
