rows is a 343x30 matrix of real numbers. Im trying to append row vectors from rows to true rows and false rows but it only adds the first row and doesnt do anything afterwards. Ive tried vstack and also tried putting example as a 2d array ([example]) but it crashed my pycharm. what can I do?
true_rows = []
true_labels = []
false_rows = []
false_labels = []
i = 0
for example in rows:
if question.match(example):
true_rows = np.append(true_rows , example , axis=0)
true_labels.append(labels[i])
else:
#false_rows = np.vstack(false_rows, example_t)
false_rows = np.append(false_rows, example, axis=0)
false_labels.append(labels[i])
i = 1
CodePudding user response:
you can use only a simple list to append your rows and then transform this list to numpy array such as :
exemple1 = np.array([1,2,3,4,5])
exemple2 = np.array([6,7,8,9,10])
exemple3 = np.array([11,12,13,14,15])
true_rows = []
true_rows.append(exemple1)
true_rows.append(exemple2)
true_rows.append(exemple3)
true_rows = np.array(true_rows)
you will get this results:
true_rows = array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]])
you can also use np.concatenate if you want to get one dimensional array like this:
true_rows = np.concatenate(true_rows , axis =0)
you will get this results:
true_rows = array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
CodePudding user response:
Your use of [] and np.append suggests you are trying to imitate a common list append model with arrays. You atleast read enough of the np.append docs to know you need to use axis, and that it returns a new array (the docs are quite clear this is a copy).
But did you test this idea with a small example, and actually look at the results (step by step)?
In [326]: rows = []
In [327]: rows = np.append(rows, np.arange(3), axis=0)
In [328]: rows
Out[328]: array([0., 1., 2.])
In [329]: rows.shape
Out[329]: (3,)
the first append doesn't do anything - the result is the same as arange(3).
In [330]: rows = np.append(rows, np.arange(3), axis=0)
In [331]: rows
Out[331]: array([0., 1., 2., 0., 1., 2.])
In [332]: rows.shape
Out[332]: (6,)
Do you understand why? We join 2 1d arrays on axis 0, making a 1d.
Using [] as a starting point is the same starting with this array:
In [333]: np.array([])
Out[333]: array([], dtype=float64)
In [334]: np.array([]).shape
Out[334]: (0,)
And with axis, np.append is just a call to concatenate:
In [335]: np.concatenate(( [], np.arange(3)), axis=0)
Out[335]: array([0., 1., 2.])
np.append sort looks like list append, but it is not a clone. It's really just a poorly named way to use concatenate. And you can't use it properly without actually understanding dimensions. np.append has an example with an error much like what you got with concatentate.
Repeated use of these array concatenates in a loop is not a good idea. It's hard to get the dimensions right, as you found. And even when it works, it is slow, since each step makes a copy (which grows with the iteration).
That's why the other answer sticks with list append.
vstack is like concatenate with axis 0, but it makes sure all arguments are 2d. But if the number columns differ, it raise an error:
In [336]: np.vstack(( [],np.arange(3)))
Traceback (most recent call last):
File "<ipython-input-336-22038d6ef0f7>", line 1, in <module>
np.vstack(( [],np.arange(3)))
File "<__array_function__ internals>", line 180, in vstack
File "/usr/local/lib/python3.8/dist-packages/numpy/core/shape_base.py", line 282, in vstack
return _nx.concatenate(arrs, 0)
File "<__array_function__ internals>", line 180, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 0 and the array at index 1 has size 3
In [337]: np.vstack(( [0,0,0],np.arange(3)))
Out[337]:
array([[0, 0, 0],
[0, 1, 2]])
If all you are joining are rows of a (n,30) array, then you do know the column size of the result.
In [338]: res = np.zeros((0,3))
In [339]: np.vstack(( res, np.arange(3)))
Out[339]: array([[0., 1., 2.]])
If you pay attention to the shape details, it is possible to create an array iteratively.
But instead of collecting rows one by one, why not create a mask and do the collection once.
Roughly do
mask = np.array([question.match(example) for example in rows])
true_rows = rows[mask]
false_rows = rows[~mask]
this still requires an iteration, but overall should be faster.
