I'm trying to calculate the cosine similarity of two vectors stored in dictionaries dict_1 and dict_2, this is my code:
import math
from numpy import dot
def norma(dict):
sqr_sum = 0.0
for x in dict:
sqr_sum = dict[x] * dict[x]
return math.sqrt(sqr_sum)
def cosine_similarity(dict_1, dict_2):
List1 = list(dict_1.values())
List2 = list(dict_2.values())
similarity = dot(List1,List2) / (norma(dict_1) * norma(dict_2))
return round(similarity, 2)
if __name__ == '__main__':
print(cosine_similarity({"a": 1, "b": 2, "c": 3}, {"b": 4, "c": 5, "d": 6}))
The function norma() is used to calculate the norm of the dicts. When I execute the code, I got the output 0.97, but the expected output is approximately 0.7, where am I missing?
CodePudding user response:
Quick and dirty way.
You don't have to calculate values for keys that are not in both dictionaries. constant * 0 = 0
import math
import numpy as np
def norma(dct):
return math.sqrt(sum(x*x for x in dct.values()))
def cosine_similarity(dict_1, dict_2):
intersecting_keys = list(dict_1.keys() & dict_2.keys())
List1 = list(dict_1[k] for k in intersecting_keys)
List2 = list(dict_2[k] for k in intersecting_keys)
similarity = np.dot(List1,List2) / (norma(dict_1) * norma(dict_2))
return round(similarity, 2)
if __name__ == '__main__':
print(cosine_similarity({"a": 1, "b": 2, "c": 3}, {"c": 5, "b": 4, "d": 6}))
Outputs:
0.7
