Finding minimum no. of sub-list from the list(l2) which have maximum elements in a other list(l1). A-CodePudding

Get the least no. of subarrays from the list(l2) which in total have the maximum number of elements from the original list(l1). Subarrays element (Answer) should not be more than L1 element, which Means if 2 is repeated 2 times in L1, so in answer including all subarrays count of 2 can't be more than 2.

Example-1

L1 = [2,3,5,6]
L2 = [[3], [2, 5], [2, 6], [3, 5], [5, 6], [2, 3, 6]]

The answer for the above example is [[2, 6], [3, 5]]

Example-2

L1 = [2,3,5,6]
L2 = [[3], [2, 5], [2, 6], [5, 6], [2, 3, 6]]

The answer for the above example is [[2, 3, 6]]

I tried the below approach, but it is taking time due to res_comb since it will have lots of combinations if the length of res is more, let's assume 40. I need something which is much faster.

def return_similar(res,search):  
    res_comb = [list(map(list,combinations(res,i))) for i in range(1,len(res) 1)]
    dict_search = defaultdict(int)
    for x in search:
        dict_search[x] =1
    match = []
    maxs=0
    for x in res_comb:
        for val in x:
            final_res = []
            for inner in val:
                final_res.extend(inner)
            dict_final_res = defaultdict(int)
            for x in final_res:
                dict_final_res[x] =1
            count=0
            counter=0
            for x in set(final_res):
                if dict_search[x]<dict_final_res[x]:
                    counter=1
                    break
            if counter==0:
                count = len(final_res)
                if count>maxs:
                    maxs=count
                    match.clear()
                    match.append(val)
                elif (count==maxs) and (count!=0) :
                    match.append(val)
                    
    return match

CodePudding user response：

Your problem can easily be reduced to maximum weight independent set, where:

the vertices are the lists in L2;
two lists share an edge if they have at least one element in common;
the weight of a list is its length.

Sadly, the maximum independent set problem is NP-hard, and hard to approximate.

A much easier problem is maximal independent set. A solution to maximal independent set problem is an approximate solution for maximum independent set, although not necessarily a good one.

Finding a maximal independent set with module networkx:

from networkx import Graph
from networkx import maximal_independent_set
from itertools import combinations

L2 =  [[3], [2, 5], [2, 6], [5, 6], [2, 3, 6]]
vertices = [frozenset(u) for u in L2]

G = Graph()
G.add_nodes_from(vertices)
G.add_edges_from((u,v) for u,v in combinations(vertices, 2) if u.intersection(v))

mis = maximal_independent_set(G)

print(mis)
# [frozenset({3}), frozenset({5, 6})]

As you can see, the algorithm found [{3}, {5,6}], which is suboptimal: [{2,5}, {3,6}] would have been better.

Note that networkx.maximal_independent_set uses a randomized algorithm: you could run it several times, and keep the best solution found.

Resources