Get the longest common substring in a dictionary and create a new one in python-CodePudding

i have a dictionary with generated words in english, for this example i will use this:

dict = {'Hi': 'TEST', 'Hi there': 'TEST', 'Billie Joe': 'TEST', 'Banana': 'ABC5', 'Banana is red': 'ABC5', 'Cellphone': 'TEST', 'Idea': 'TEST', 'Hi there, hello world': 'TEST'}

If you look closely, you'll see that some keys contain similar strings with same values like the first key 'Hi'( with TEST as value), the second key 'Hi there' (also TEST as value) and the last one 'Hi there, hello world' (also TEST as value). All three of them contain the string 'Hi' and Test as Value . Another example is with 'Banana' (ABC5 as value) and 'Banana is red'(ABC5 as value), the common substring is banana and both of them have the same value.

What i want is to do is to create a new dictionary like this:

dict2 = {'Billie Joe': 'TEST', 'Banana is red': 'ABC5', 'Cellphone': 'TEST', 'Idea': 'TEST', 'Hi there, hello world': 'TEST'}

The first dictionary had some keys with a common substring, in my example 'Hi' and same value: 'TEST'. What i want to do is to find and retain only the longest key (string essentially) between several keys that share the same substring in the dictionary and create a new dictionary with only that key. In case of a key not having other key as a substring, i want to directly copy the key value pair to the new dictionary.

I hope i'm clear, can somebody help me or guide me to resolve this problem?

Thanks in advance

CodePudding user response：

d = {'Hi': 'TEST', 'Hi there': 'TEST', 'Billie Joe': 'TEST', 'Banana': 'ABC5', 'Banana is red': 'ABC5', 'Cellphone': 'TEST', 'Idea': 'TEST', 'Hi there, hello world': 'TEST'}

new_d = {}
for key, value in d.items():

    is_substring = False
    for check in d.keys():
        if key != check and key in check and d[key] == d[check]:
            is_substring = True

    if not is_substring:
        new_d[key] = value

Iterates over the dictionary and checks if the current key is a substring of any other key except itself and the value matches.

CodePudding user response：

A different approach.

1.Make a copy of your original dict by using .copy(). Dont use '=' because then original dict changes.

2.Find matching and then shortest length key and delete that particular key:value ,in this way the unmatched ones always remain in the output.

3.pop(key, None) is used so that if key doesnt exist then 'None' is deleted which doesn't break the code. Cheers

dict = {'Hi': 'TEST', 'Hi there': 'TEST', 'Billie Joe': 'TEST', 'Banana': 'ABC5', 'Banana is red': 'ABC5', 'Cellphone': 'TEST', 'Idea': 'TEST', 'Hi there, hello world': 'TEST'}
output = dict.copy()

for key1,value in dict.items():
    for key2 in dict.keys():
        if key1 in key2:
            if len(key2) > len(key1):
                output.pop(key1,None)
                key1 = key2
            
print(output)