I have two numpy arrays:
A = [[1,2,3],
[4,5,6]]
B = [[-1,-2,-3],
[-4,-5,-6]]
I would like to combine the two into a normal python list, such that (i,j) element from each array is put in the same list:
C= [[1,-1],[2,-2],[3,-3],[4,-4],[5,-5],[6,-6]]
Is there a way to do this better than the naive O(n^2)?
CodePudding user response:
You can do it in O(n) with:
A = np.array([[1,2,3],
[4,5,6]])
B = np.array([[-1,-2,-3],
[-4,-5,-6]])
C = list(map(list, zip(A.ravel(), B.ravel())))
Output:
[[1, -1], [2, -2], [3, -3], [4, -4], [5, -5], [6, -6]]
If you don't mind tuples :
C = list(zip(A.ravel(), B.ravel()))
Output:
[(1, -1), (2, -2), (3, -3), (4, -4), (5, -5), (6, -6)]
CodePudding user response:
You can cast them to np.array, reshape and transpose:
out = np.array([A,B]).reshape(2,-1).T.tolist()
Or iterate over the sublists and use zip:
out = [list(tpl) for i in range(len(A)) for tpl in zip(A[i], B[i])]
Output:
[[1, -1], [2, -2], [3, -3], [4, -4], [5, -5], [6, -6]]
CodePudding user response:
First, you show two lists, but call them arrays.
In [102]: A = [[1,2,3],
...: [4,5,6]]
...: B = [[-1,-2,-3],
...: [-4,-5,-6]]
A version of the pure array approach:
In [103]: [[i,j] for a,b in zip(A,B) for i,j in zip(a,b)]
Out[103]: [[1, -1], [2, -2], [3, -3], [4, -4], [5, -5], [6, -6]]
And a numpy approach - this works with list or arrays since np.stack turns lists into arrays as needed.
In [104]: np.stack((A,B),axis=2).reshape(-1,2).tolist()
Out[104]: [[1, -1], [2, -2], [3, -3], [4, -4], [5, -5], [6, -6]]
np.array((A,B)) can also be used but needs some transposing; but that's pretty cheap.
O(n) analysis isn't that useful when evaluating alternatives like this. The big divide is between iterating in python, as the double list comprehension does, or using numpy array methods (which iterate in compiled code). But there are nuances to that as well. Iteration on arrays is slower. Creating arrays from lists takes time. And there's a matter of scaling. Array approaches often have a high setup cost, but scale better.
Lets compare the times:
In [105]: timeit [[i,j] for a,b in zip(A,B) for i,j in zip(a,b)]
1.69 µs ± 29.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [106]: timeit np.stack((A,B),axis=2).reshape(-1,2).tolist()
22.2 µs ± 1.37 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [107]: %%timeit A1=np.array(A); B1=np.array(B)
...: np.stack((A1,B1),axis=2).reshape(-1,2).tolist()
14.5 µs ± 65.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
For this small example, the pure list approach is fastest. Starting with arrays instead of lists improves the array approach, but it is still slower.
Try something larger:
In [109]: AA = np.ones((200,300)).tolist()
In [110]: timeit [[i,j] for a,b in zip(AA,AA) for i,j in zip(a,b)]
10.4 ms ± 19.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [111]: %%timeit A1=np.array(AA); B1=np.array(AA)
...: np.stack((A1,B1),axis=2).reshape(-1,2).tolist()
13.4 ms ± 334 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
still not enough to give the arrays the advantage.
But what if we drop the requirement that the result be a list?
In [119]: %%timeit A1=np.array(AA); B1=np.array(AA)
...: np.stack((A1,B1),axis=2).reshape(-1,2)
144 µs ± 4.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
tolist() is compiled and relatively fast, but still for large enough arrays it does take time.
To sum this - if you start with lists, and return a list, the pure list approach remains best. But starting and ending with arrays can be faster. Mixing list and arrays slows down both.
CodePudding user response:
Alternative, NumPy only, solution to the accepted answer:
A = np.array([[1,2,3],
[4,5,6]])
B = np.array([[-1,-2,-3],
[-4,-5,-6]])
C = np.column_stack((A.ravel(),B.ravel())).tolist()
Output:
[[1, -1], [2, -2], [3, -3], [4, -4], [5, -5], [6, -6]]
