I am playing around with numba to accelerate my code. I notice that the performance varies significantly when using np.inf instead np.nan inside the function. Below I have attached three sample functions for illustration.
function1is not accelerated bynumba.function2andfunction3are both accelerated bynumba, but one usesnp.nanwhile the other usesnp.inf.
On my machine, the average runtime of the three functions are 0.032284s, 0.041548s and 0.019712s respectively. It appears that using np.nan is much slower than np.inf. Why does the performance vary significantly? Thanks in advance.
Edit: I am using Python 3.7.11 and Numba 0.55.Orc1.
import numpy as np
import numba as nb
def function1(array1, array2):
nr, nc = array1.shape
output1 = np.empty((nr, nc), dtype='float')
output2 = np.empty((nr, nc), dtype='float')
output1[:] = np.nan
output2[:] = np.nan
for r in range(nr):
row1 = array1[r]
row2 = array2[r]
diff = row1 - row2
id_threshold =np.nonzero( (row1 - row2) > 8 )
output1[r][id_threshold] = 1
output2[r][id_threshold] = 0
output1 = output1.flatten()
output2 = output2.flatten()
id_keep = np.nonzero(output1 != np.nan)
output1 = output1[id_keep]
output2 = output2[id_keep]
output = np.vstack((output1, output2))
return output
@nb.njit('float64[:,::1](float64[:,::1], float64[:,::1])', parallel=True)
def function2(array1, array2):
nr, nc = array1.shape
output1 = np.empty((nr,nc), dtype='float')
output2 = np.empty((nr, nc), dtype='float')
output1[:] = np.nan
output2[:] = np.nan
for r in nb.prange(nr):
row1 = array1[r]
row2 = array2[r]
diff = row1 - row2
id_threshold =np.nonzero( (row1 - row2) > 8 )
output1[r][id_threshold] = 1
output2[r][id_threshold] = 0
output1 = output1.flatten()
output2 = output2.flatten()
id_keep = np.nonzero(output1 != np.nan)
output1 = output1[id_keep]
output2 = output2[id_keep]
output = np.vstack((output1, output2))
return output
@nb.njit('float64[:,::1](float64[:,::1], float64[:,::1])', parallel=True)
def function3(array1, array2):
nr, nc = array1.shape
output1 = np.empty((nr,nc), dtype='float')
output2 = np.empty((nr, nc), dtype='float')
output1[:] = np.inf
output2[:] = np.inf
for r in nb.prange(nr):
row1 = array1[r]
row2 = array2[r]
diff = row1 - row2
id_threshold =np.nonzero( (row1 - row2) > 8 )
output1[r][id_threshold] = 1
output2[r][id_threshold] = 0
output1 = output1.flatten()
output2 = output2.flatten()
id_keep = np.nonzero(output1 != np.inf)
output1 = output1[id_keep]
output2 = output2[id_keep]
output = np.vstack((output1, output2))
return output
array1 = 10*np.random.random((1000,1000))
array2 = 10*np.random.random((1000,1000))
output1 = function1(array1, array2)
output2 = function2(array1, array2)
output3 = function3(array1, array2)
CodePudding user response:
The second one is much slower because output1 != np.nan returns a copy output1 since np.nan != np.nan is True (like any other value -- v != np.nan is always true). Thus, the resulting array to compute are much bigger causing a slower execution.
The point is you must never compare a value to np.nan using comparison operators: use np.isnan(value) instead. In your case, you should use np.logical_not(np.isnan(output1)).
The second implementation may be slightly slower due to the temporary array created by np.logical_not (I did not see any statistically significant difference on my machine between using NaN or Inf once the code has been corrected).
