Home > Blockchain >  Efficient way to create the probability distribution of a list of numbers with numpy
Efficient way to create the probability distribution of a list of numbers with numpy

Time:01-26

This is an example of what I am trying to do. Suppose the following numpy array:

A = np.array([3, 0, 1, 5, 7]) # in practice, this array is a huge array of float numbers: A.shape[0] >= 1000000

I need the fastest possible way to get the following result:

result = []

for a in A:
    result.append( 1 / np.exp(A - a).sum() )

result = np.array(result)

print(result)

>>> [1.58297157e-02 7.88115138e-04 2.14231906e-03 1.16966657e-01 8.64273193e-01]

Option 1 (faster than previous code):

result = 1 / np.exp(A - A[:,None]).sum(axis=1)

print(result)

>>> [1.58297157e-02 7.88115138e-04 2.14231906e-03 1.16966657e-01 8.64273193e-01]

Is there a faster way to get "result" ?

EDIT: yes, scipy.special.softmax did the trick

CodePudding user response:

Rather than trying to compute each value by normalizing it in place (effectively adding up all the values, repeatedly for each value), instead just get the exponentials and then normalize once at the end. So:

raw = np.exp(A)
result = A / sum(A)

(In my testing, the builtin sum is over 2.5x as fast as np.sum for summing a small array. I did not test with larger ones.)

CodePudding user response:

Yes: scipy.special.softmax did the trick

from scipy.special import softmax

result = softmax(A)

Thank you @j1-lee and @Karl Knechtel

  •  Tags:  
  • Related