Home > Enterprise >  Vectorization of numpy matrix that contains pdf of multiple gaussians and multiple samples
Vectorization of numpy matrix that contains pdf of multiple gaussians and multiple samples

Time:01-12

My problem is this: I have GMM model with K multi-variate gaussians, and also I have N samples. I want to create a N*K numpy matrix, which in it's [i,k] cell there is the pdf function of the k'th gaussian on the i'th sample, i.e. in this cell there is In short, I'm intrested in the following matrix: pdf matrix

This what I have now (I'm working with python):

Q = np.array([scipy.stats.multivariate_normal(mu_t[k], cov_t[k]).pdf(X) for k in range(self.K)]).T

X in the code is a matrix whose lines are my samples.

It's works fine on small toy dataset from small dimension, but the dataset I'm working with is 10,000 28*28 pictures, and on it this line run extremely slowly...

I want to find a solution that doesn't envolve loops but only vector\matrix operation (i.e. vectorization). The scipy 'multivariate_normal' function cannot parameters of more than 1 gaussians, as far as I understand it (but it's 'pdf' function can calculates on multiple samples at once).

Does someone have an Idea?

CodePudding user response:

I am afraid, that the main speed killer in your problem is the inversion and deteminant calculation for the cov_t matrices. If you somehow managed to precalculate these, you could enroll the calculation and use np.add.outer to get all combinations of x_i - mu_k and then use array comprehension to calculate the probabilities with the full formula of the normal distribution function.

Try

S = np.add.outer(X,-mu_t)
cov_t_inv = ??
cov_t_inv_det = ??
Q = 1/(2*np.pi*cov_t_inv_det)**0.5 * np.exp(-0.5*np.einsum('ikr,krs,kis->ik',S,cov_t_inv,S))

Where you insert precalculated arrays cov_t_inv for the inverse covariance matrices and cov_t_inv_det for their determinants.

  •  Tags:  
  • Related