I have seen Python get column vector from array of tuples, which I expected would have answered my question, but it doesn't.
So, I've prepared an example based on an example in that post, which shows what I want to do, and where I get stuck:
import numpy as np
# based on https://stackoverflow.com/a/48716125/6197439
# arr is a numpy array of tuple "pairs" of floats
oarr = [(0.109, 0.5), (0.109, 0.55), (0.109, 0.6), (0.2, 0.4), (0.3, 0.5)]
arr = np.array(oarr)
print("arr type: {} shape: {} dt {}".format(
type(arr), arr.shape, arr.dtype)) # arr type: <class 'numpy.ndarray'> shape: (5, 2) dt float64
print("slice arr[:, 1]: {}".format(arr[:, 1])) # slice arr[:, 1]: [0.5 0.55 0.6 0.4 0.5 ]
print("slice arr[0, :]: {}".format(arr[0, :])) # slice arr[0, :]: [0.109 0.5 ]
print("arr len: {}".format(len(arr))) # arr len: 5
# arr2, instead, becomes a numpy array of tuple "pairs",
# with first element tuple of string and float, and second element float
# arr2 can still be sliced by numpy fine:
oarr2 = []
for ix in range(len(arr)):
oarr2.append( ( (str(oarr[ix][0]), oarr[ix][0]), oarr[ix][1] ) )
arr2 = np.array( oarr2, dtype=object )
print("arr2 type: {} shape: {} dt {}".format(
type(arr2), arr2.shape, arr2.dtype)) # arr2 type: <class 'numpy.ndarray'> shape: (5, 2) dt object
print("slice arr2[:, 1]: {}".format(arr2[:, 1])) # slice arr2[:, 1]: [0.5 0.55 0.6 0.4 0.5]
print("slice arr2[0, :]: {}".format(arr2[0, :])) # slice arr2[0, :]: [('0.109', 0.109) 0.5]
print("arr2 len: {}".format(len(arr2))) # arr2 len: 5
# arr2fc is where we attempt to extract the tuples in arr2 "first column",
# using numpy slicing syntax.
# arr2fc is now a numpy array of objects, as previously,
# but these objects (tuple pairs of string and float),
# are now *not* considered objects with lengths, (see .shape below)
# so extracting e.g. the first column (the string element)
# of the tuple, with numpy slicing syntax, fails:
arr2fc = arr2[:, 0]
print(arr2fc) # [('0.109', 0.109) ('0.109', 0.109) ('0.109', 0.109) ('0.2', 0.2) ('0.3', 0.3)]
print("arr2fc type: {} shape: {} dt {}".format(
type(arr2fc), arr2fc.shape, arr2fc.dtype)) # arr2fc type: <class 'numpy.ndarray'> shape: (5,) dt object
print("slice arr2fc[:, 1]: {}".format(arr2fc[:, 1])) # IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
Basically, I'd like to extract the "columns" formed by tuples in arr2fc as separate numpy arrays; so from the column formed by first (the string) element of this tuple, I'd like to get numpy array of object (here string):
[ '0.109', '0.109', '0.109', '0.2', '0.3' ]
... and from the column formed by second (the float) element of this tuple, I'd like to get numpy array of float:
[ 0.109, 0.109, 0.109, 0.2, 0.3 ]
Sure, I can always do a Python loop, then iterate and populate an empty Python list, then convert that to numpy array -- however, is there something like a numpy slicing syntax, that would enable me to extract these "columns" with a one-liner, avoiding Python loops?
CodePudding user response:
For that you might want to use numpy vectorize. With numpy vectorize you can "vectorize" a function so that it can be applied on an input array and produce a new array or a tuple of arrays. For your example that could look like
vectorized_split = np.vectorize(lambda x: (x[0],x[1]))
string_array,float_array = vectorized_split(arr2fc)
It is important to note that this will not give you any numpy vectorization performance gains, as it just runs a for loop under the hood. However, when you cannot make use of numpy vectorization like in this case, it gives you at least a compact codebase.
CodePudding user response:
Your code as displayed in ipython:
In [178]: oarr = [(0.109, 0.5), (0.109, 0.55), (0.109, 0.6), (0.2, 0.4), (0.3,0.5)]
...: arr = np.array(oarr)
In [179]: oarr
Out[179]: [(0.109, 0.5), (0.109, 0.55), (0.109, 0.6), (0.2, 0.4), (0.3, 0.5)]
In [180]: arr
Out[180]:
array([[0.109, 0.5 ],
[0.109, 0.55 ],
[0.109, 0.6 ],
[0.2 , 0.4 ],
[0.3 , 0.5 ]])
So starting with a list of tuples, we get a 2d array, with float dtype. A list of lists would work the same way.
Your next array:
In [181]: oarr2 = []
...: for ix in range(len(arr)):
...: oarr2.append( ( (str(oarr[ix][0]), oarr[ix][0]), oarr[ix][1] ) )
...: arr2 = np.array( oarr2, dtype=object )
In [182]: oarr2
Out[182]:
[(('0.109', 0.109), 0.5),
(('0.109', 0.109), 0.55),
(('0.109', 0.109), 0.6),
(('0.2', 0.2), 0.4),
(('0.3', 0.3), 0.5)]
In [183]: arr2
Out[183]:
array([[('0.109', 0.109), 0.5],
[('0.109', 0.109), 0.55],
[('0.109', 0.109), 0.6],
[('0.2', 0.2), 0.4],
[('0.3', 0.3), 0.5]], dtype=object)
Again a 2d list, (5,2), but with a tuple as one element in each row.
Selecting a column:
In [184]: arr2fc = arr2[:, 0]
In [185]: arr2fc
Out[185]:
array([('0.109', 0.109), ('0.109', 0.109), ('0.109', 0.109), ('0.2', 0.2),
('0.3', 0.3)], dtype=object)
In [186]: _.shape
Out[186]: (5,)
A 1d array of objects - each a tuple.
Converting it back to list, we can make a 2d array and again index a column:
In [187]: arr2fc.tolist()
Out[187]:
[('0.109', 0.109),
('0.109', 0.109),
('0.109', 0.109),
('0.2', 0.2),
('0.3', 0.3)]
In [188]: np.array(arr2fc.tolist(),object)
Out[188]:
array([['0.109', 0.109],
['0.109', 0.109],
['0.109', 0.109],
['0.2', 0.2],
['0.3', 0.3]], dtype=object)
In [189]: _[:,1]
Out[189]: array([0.109, 0.109, 0.109, 0.2, 0.3], dtype=object)
or with a list comprehension:
In [190]: [x[1] for x in arr2fc]
Out[190]: [0.109, 0.109, 0.109, 0.2, 0.3]
Multidimensional indexing only works on the dimensions shown by the shape. It does not "reach through" and index the objects, even if they are, by themselves, indexable.
Some comparative times:
In [194]: timeit string_array,float_array = vectorized_split(arr2fc)
31.5 µs ± 277 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [195]: timeit [x[1] for x in arr2fc]
1.57 µs ± 1.07 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [196]: timeit np.array(arr2fc.tolist(),object)[:,1]
3.77 µs ± 65 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Here the "vectorize" method is much slower. For large arrays, "vectorize" speeds are closer to the list comprehension speeds.
