Home > database >  How to create a 2D array with all possible combinations of a constrained range for each column
How to create a 2D array with all possible combinations of a constrained range for each column

Time:02-06

I have three ranges and want to create a 3 column array with every possible combination of these ranges and for it to be in a specific order. I know how to do this with a loop. However, in reality the data will have way more than 3 columns and the ranges are very large so I think a loop will be inefficient and would like a fast way of doing this. The real dataset size will be approximately 5 GB so efficiency is key for me. As an example:

inc = 1
a = np.arange(1001,1002 inc,inc)
b = np.arange(1,3 inc,inc)
c = np.arange(1,5 inc,inc)

I want to create an output that looks like:

array([[1001,    1,    1],
       [1001,    1,    2],
       [1001,    1,    3],
       [1001,    1,    4],
       [1001,    1,    5],
       [1001,    2,    1],
       [1001,    2,    2],
       [1001,    2,    3],
       [1001,    2,    4],
       [1001,    2,    5],
       [1001,    3,    1],
       [1001,    3,    2],
       [1001,    3,    3],
       [1001,    3,    4],
       [1001,    3,    5],
       [1002,    1,    1],
       [1002,    1,    2],
       [1002,    1,    3],

This output is not complete but it shows what I want. I should add that I am doing this because I have an input table of the same format but with missing rows and I want to be able to identify the missing rows by comparing the input dataset to this 'ideal' table. As mentioned above, I can do this with a for loop but want to find a more Pythonic way of doing it if possible.

CodePudding user response:

I recommend using numpy.meshgrid because it runs significantly faster.

>>> np.array(np.meshgrid(a,b,c)).T.reshape((-1, 3))
array([[1001,    1,    1],
       [1001,    2,    1],
       [1001,    3,    1],
       [1002,    1,    1],
       [1002,    2,    1],
       [1002,    3,    1],
       [1001,    1,    2],
       [1001,    2,    2],
       [1001,    3,    2],
       [1002,    1,    2],

If order is important, this seems to do it.

np.array([m.flatten() for m in np.meshgrid(a,b,c, indexing='ij')]).T

CodePudding user response:

You can do it easily with the built-in itertools.product:

import itertools as it

perms = np.array(list(it.product(a, b, c)))

Output:

>>> perms
array([[1001,    1,    1],
       [1001,    1,    2],
       [1001,    1,    3],
       [1001,    1,    4],
       [1001,    1,    5],
       [1001,    2,    1],
       [1001,    2,    2],
       [1001,    2,    3],
       [1001,    2,    4],
       [1001,    2,    5],
       [1001,    3,    1],
       [1001,    3,    2],
       [1001,    3,    3],
       [1001,    3,    4],
       [1001,    3,    5],
       [1002,    1,    1],
       [1002,    1,    2],
       [1002,    1,    3],
       [1002,    1,    4],
       [1002,    1,    5],
       [1002,    2,    1],
       [1002,    2,    2],
       [1002,    2,    3],
       [1002,    2,    4],
       [1002,    2,    5],
       [1002,    3,    1],
       [1002,    3,    2],
       [1002,    3,    3],
       [1002,    3,    4],
       [1002,    3,    5]])
  •  Tags:  
  • Related