I would like to calculate the log-ratios for my 2D array, e.g.
a = np.array([[3,2,1,4], [2,1,1,6], [1,5,9,1], [7,8,2,2], [5,3,7,8]])
The formula is ln(x/g(x)), where g(x) is the geometric mean of each row. I execute it like this:
logvalues = np.array(a) # the values will be overwritten through the code below.
for i in range(len(a)):
row = np.array(a[i])
geo_mean = row.prod()**(1.0/len(row))
flr = lambda x: math.log(x/geo_mean)
logvalues = np.array([flr(x) for x in row])
I was wondering if there is any way to vectorise the above lines (preferably without introducing other modules) to make it more efficient?
CodePudding user response:
This should do the trick:
geo_means = a.prod(1)**(1/a.shape[1])
logvalues = np.log(a/geo_means[:, None])
CodePudding user response:
Another way you could do this is just write the function as though for a single 1-D array, ignoring the 2-D aspect:
def f(x):
return np.log(x / x.prod()**(1.0 / len(x)))
Then if you want to apply it to all rows in a 2-D array (or N-D array):
>>> np.apply_along_axis(f, 1, a)
array([[ 0.30409883, -0.10136628, -0.79451346, 0.5917809 ],
[ 0.07192052, -0.62122666, -0.62122666, 1.17053281],
[-0.95166562, 0.65777229, 1.24555895, -0.95166562],
[ 0.59299864, 0.72653003, -0.65976433, -0.65976433],
[-0.07391256, -0.58473818, 0.26255968, 0.39609107]])
Some other general notes on your attempt:
for i in range(len(a)): If you want to loop over all rows in an array it's generally faster to do simplyfor row in a. NumPy can optimize this case somewhat, whereas if you dofor idx in range(len(a))then for each index you have to again index the array witha[idx]which is slower. But even then it's better not to use aforloop at all where possible, which you already know.row = np.array(a[i]): Thenp.array()isn't necessary. If you index an multi-dimensional array the returned value is already an array.lambda x: math.log(x/geo_mean): Don't usemathfunctions with NumPy arrays. Use the equivalents in thenumpymodule. Wrapping this in a function adds unnecessary overhead as well. Since you use this like[flr(x) for x in row]that's just equivalent to the already vectorized NumPy operations:np.log(row / geo_mean).
