I have np 2d array (called old_array) which looks like this:
1 2 3 4 5 6 7 8 9
4 3 5 1 6 7 3 2 8
8 3 4 1 8 3 2 9 3
7 3 5 8 2 5 9 2 6
Trying to select only values greater than 4 and would like to have output like this:
5 6 7 8 9
5 6 7 8
8 8 9
7 5 8 5 9 6
I was trying to use new_array = old_array[old_array>4] but not getting desired solution. Any explanation what I am doing wrong, or how this should be done. Also trying to avoid loops if possible.
Thanks.
CodePudding user response:
If these arrays are NumPy arrays and you'd like to get the indexing of the elements rather than 4, you can use
np.argwhere(array > 4)
This function returns you another array with the indexes of the original array where the condition is True.
np.argwhere(array>4)
array([[0, 4],
[0, 5],
[0, 6],
[0, 7],
[0, 8],
[1, 2],
[1, 4],
[1, 5],
[1, 8],
[2, 0],
[2, 4],
[2, 7],
[3, 0],
[3, 2],
[3, 3],
[3, 5],
[3, 6],
[3, 8]])
Otherwise
array>4
array([[False, False, False, False, True, True, True, True, True],
[False, False, True, False, True, True, False, False, True],
[ True, False, False, False, True, False, False, True, False],
[ True, False, True, True, False, True, True, False, True]])
In this case, you obtain a mask on which you can loop through.
Hope it works
CodePudding user response:
You are trying to force numpy to work in a way it is not designed for. It is not designed to work with arrays that contain rows of not equal length.
Now, as you are informed about that, you can still do it using np.split but this is slow in general.
Here is a way to go along with:
x = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9], [4, 3, 5, 1, 6, 7, 3, 2, 8], [8, 3, 4, 1, 8, 3, 2, 9, 3], [7, 3, 5, 8, 2, 5, 9, 2, 6]])
trues = x > 4
row_lengths = np.sum(trues, axis=1)
split_idx = np.cumsum(row_lengths[:-1])
>>> np.split(x[trues], split_idx)
[array([5, 6, 7, 8, 9]),
array([5, 6, 7, 8]),
array([8, 8, 9]),
array([7, 5, 8, 5, 9, 6])]
And some tests explaining how it works:
>>> trues
array([[False, False, False, False, True, True, True, True, True],
[False, False, True, False, True, True, False, False, True],
[ True, False, False, False, True, False, False, True, False],
[ True, False, True, True, False, True, True, False, True]])
>>> x[trues]
array([5, 6, 7, 8, 9, 5, 6, 7, 8, 8, 8, 9, 7, 5, 8, 5, 9, 6])
>>> row_lengths
array([5, 4, 3, 6])
>>> split_idx
array([ 5, 9, 12], dtype=int32)
