I need to rate each combination in order to get the best one.
I completed the first step but it is not optimized at all.
When the value of RQ or NBPIS or NBSER is big, my code is much too long.
Do you have an idea to get the same result much faster?
Thank you very much
import numpy as np
from itertools import combinations, combinations_with_replacement
#USER SETTINGS
RQ=['A','B','C','D','E','F','G','H']
NBPIS=3
NBSER=3
#CODE
Combi1=np.array(list(combinations_with_replacement(RQ,NBPIS)))
Combi2=combinations_with_replacement(Combi1,NBSER)
Combi3=np.array([])
Compt=0
First=0
for X in Combi2:
Long=0
Compt=Compt 1
Y=np.array(X)
for Z in RQ:
Long=Long 1
if Z not in Y:
break
elif Long==len(RQ):
if First==0:
Combi3=Y
Combi3 = np.expand_dims(Combi3, axis = 0)
First=1
else:
Combi3=np.append(Combi3, [Y], axis = 0)
#RESULTS
print(Combi3)
print(Combi3.shape)
print(Compt)
CodePudding user response:
Assuming your code produces the desirable result, the first step to optimize your code is refactoring it. This might help others to jump in and help as well.
Let's start making a function of it.
#USER SETTINGS
RQ=['A','B','C','D','E','F','G','H']
NBPIS=3
NBSER=3
def your_code():
Combi1=np.array(list(combinations_with_replacement(RQ,NBPIS)))
Combi2=combinations_with_replacement(Combi1,NBSER)
Combi3=np.array([])
Compt=0
First=0
for X in Combi2:
Long=0
Compt=Compt 1
Y=np.array(X)
for Z in RQ:
Long=Long 1
if Z not in Y:
break
elif Long==len(RQ):
if First==0:
Combi3=Y
Combi3 = np.expand_dims(Combi3, axis = 0)
First=1
else:
Combi3=np.append(Combi3, [Y], axis = 0)
shape = Combi3.shape
size = Compt
return Combi3, shape, size
Refactoring
Notice that Compt is equal to len(Combi2), so turning Combi2 as a numpy array will help to simplify the code. This also allow the variable Y to be replaced by X only. Also, there is no need for Combi1 to be a numpy array since it is only consumed by combinations_with_replacement.
def your_code_refactored():
Combi1 = combinations_with_replacement(RQ,NBPIS)
Combi2 = np.array(list(combinations_with_replacement(Combi1,NBSER)))
Combi3=np.array([])
First=0
for X in Combi2:
Long=0
for Z in RQ:
Long=Long 1
if Z not in X:
break
elif Long==len(RQ):
if First==0:
Combi3=X
Combi3 = np.expand_dims(Combi3, axis = 0)
First=1
else:
Combi3=np.append(Combi3, [X], axis = 0)
shape = Combi3.shape
size = len(Combi2)
return Combi3, shape, size
Next thing is to refactor how Combi3 is created and populated. The varialbe First is used to expand Combi3 dimension in the first interaction only, so this logic can be simplified as,
def your_code_refactored():
Combi1 = combinations_with_replacement(RQ,NBPIS)
Combi2 = np.array(list(combinations_with_replacement(Combi1,NBSER)))
Combi3 = np.empty((0, NBPIS, NBSER))
for X in Combi2:
Long=0
for Z in RQ:
Long=Long 1
if Z not in X:
break
elif Long==len(RQ):
Combi3 = np.append(Combi3, [X], axis = 0)
shape = Combi3.shape
size = len(Combi2)
return Combi3, shape, size
It seems Combi2 is populated only with combinations that have at least one of each element from RQ. This is accomplished by testing if each element of RQ is in X, which is basically checking if RQ is a subset of X. So it is simplified further,
def your_code_refactored():
Combi1 = combinations_with_replacement(RQ,NBPIS)
Combi2 = np.array(list(combinations_with_replacement(Combi1,NBSER)))
Combi3 = np.empty((0, NBPIS, NBSER))
unique_RQ = set(RQ)
for X in Combi2:
if unique_RQ.issubset(X.flatten()):
Combi3 = np.append(Combi3, [X], axis = 0)
shape = Combi3.shape
size = len(Combi2)
return Combi3, shape, size
This looks much simpler. Time to make it faster, maybe :)
Optimizing
One way this code can be optimized is to replace np.append by list.append. In numpy's documentation we see that np.append reallocate a larger and larger array each time it is called. The code might be speed up with list.append, since it over-allocates memory. So the code could be rewritten with list comprehension.
def your_code_refactored_and_optimized():
Combi1 = combinations_with_replacement(RQ,NBPIS)
Combi2 = np.array(list(combinations_with_replacement(Combi1,NBSER)))
unique_RQ = set(RQ)
Combi3 = np.array([X for X in Combi2 if unique_RQ.issubset(X.flatten())])
shape = Combi3.shape
size = len(Combi2)
return Combi3, shape, size
Testing
Now we can see it run faster.
import timeit
n = 5
print(timeit.timeit('your_code()', globals=globals(), number=n))
print(timeit.timeit('your_code_refactored_and_optimized()', globals=globals(), number=n))
This isn't much a gain in speed but it's something :)
CodePudding user response:
I have an idea to reduce execution time by removing unnecessary combinations, simplifying with the following example with :
RQ=['A','B','C']
NBPIS=3
NBSER=3
Currently with :
Combi1 = combinations_with_replacement(RQ,NBPIS)
print(list(Combi1))
[('A', 'A', 'A'), ('A', 'A', 'B'), ('A', 'A', 'C'), ('A', 'B', 'B'), ('A', 'B', 'C'), ('A', 'C', 'C'), ('B', 'B', 'B'), ('B', 'B', 'C'), ('B', 'C', 'C'), ('C', 'C', 'C')]
But with :
Combi1 = list(list(combinations(RQ,W)) for W in range(1,NBPIS 1))
print(Combi1)
[[('A',), ('B',), ('C',)], [('A', 'B'), ('A', 'C'), ('B', 'C')], [('A', 'B', 'C')]]
Problem with :
Combi1 = list(list(combinations(RQ,W)) for W in range(1,NBPIS 1))
Error message :
Combi3 = np.array([X for X in Combi2 if unique_RQ.issubset(X.flatten())])
TypeError: unhashable type: 'list'
But with :
(Combi1 = combinations(RQ,W) for W in range(1,NBPIS 1))
print(Combi3)
[]
Questions :
- For Combi1,
Instead of :
[[('A',), ('B',), ('C',)], [('A', 'B'), ('A', 'C'), ('B', 'C')], [('A', 'B', 'C')]]
how to get this ? :
[('A'), ('B'), ('C'), ('A', 'B'), ('A', 'C'), ('B', 'C'), ('A', 'B', 'C')]
- For Combi3, is it possible to get an array with different sizes ?
Instead of :
[[['A' 'A' 'A'] ['A' 'A' 'A'] ['A' 'B' 'C']]...
Obtain this ? :
[[['A'] ['A'] ['A' 'B' 'C']]...
