Say you have a list of strings of the same length. You want to match every string with say 1 or 2 other strings that are most similar (sharing the same character at the same position) or least similar (not sharing a character at same position)
CodePudding user response:
Not the most efficient way, but you can get the matching values from two lists like this:
>>> list_1 = ["hello", "world", "today is a good day", "have a nice day"]
>>> list_2 = ["cats", "dogs", "today is a good day", "have a nice day"]
>>> set(list_1) & set(list_2)
{'today is a good day', 'have a nice day'}
If the order is important, you can do it with comprehensions like this:
>>> list_1 = ["hello", "world", "today is a good day", "have a nice day"]
>>> list_2 = ["cats", "dogs", "today is a good day", "have a nice day"]
>>> print([i for i, j in zip(list_1, list_2) if i == j])
['today is a good day', 'have a nice day']
CodePudding user response:
It depends what you mean by "similar". I'd say two strings such as 'abcdefg' and 'gabcdef' are very similar, but under your definition they are completely different
here is a code to implement your idea
the function most_similar_index returns the indices of the n most similar strings in a list to a given string
import numpy as np
def similarity(str1, str2):
return sum([str1[i]==str2[i] for i in range(len(str1))])
def most_similar_index(list_string, s, n):
"""
list_string : list of all strings of same size
s : string of same size as all of those in list_string
n : number of indices to return
returns indices of the n closest strings to the given string
"""
temp_list = []
for string in list_string:
temp_list.append(similarity(s,string))
temp_list = np.array(temp_list)
return np.argsort(temp_list)[-1:-n-1:-1]
result :
>>> list_string = ['abcde', 'abcdf', 'xbcde', 'xeeee', 'aeeef']
>>> s = 'abcff'
>>> most_similar_index(list_string, s, 3)
array([1, 0, 4], dtype=int64)
