Getting single value from the N dim histogram in numpy or scipy-CodePudding

Assume I have a data like this:

x = np.random.randn(4, 100000) and I fit a histogram hist = np.histogramdd(x, density=True) What I what is to get probability of number g, e.g. g=0.1. Assume some hypothetical function foo then.

g = 0.1
prob = foo(hist, g)
print(prob)
>> 0.2223124214

How could I do something like this, where I get probability back for a single or a vector of numbers for a fitted histogram ? Especially histogram that is N dimensional.

CodePudding user response：

histogramdd takes O(r^D) memory, and unless you have a very large dataset or very small dimension you will have a poor estimate. Consider your example data, 100k points in 4-D space, the default histogram will be 10 x 10 x 10 x 10, so it will have 10k bins.

x = np.random.randn(4, 100000)
hist = np.histogramdd(x.transpose(), density=True)
np.mean(hist[0] == 0)

gives something arround 0.77 meaning that 77% of the bins in the histogram have no points.

You probably want to smooth the distribution. Unless you have a good reason to not do, I would suggest you to use Gaussian kernel-density Estimate

x = np.random.randn(4, 100000) # d x n array
f = scipy.stats.gaussian_kde(x) # d-dimensional PDF
f([1,2,3,4]) # evaluate the PDF in a given point