How to get the differences details of two different dictionaries?-CodePudding

How to perform below operations on two different dictionaries ? dict2 can be very huge. I believe set operations is bit slower.

get the values of keys of dict1 from dict2, whether same values or different:

dict1 = {'a': 1, 'b': 2}
dict2 = {'a': 3, 'b': 4, 'd': 5}

# Output: {'a': 3, 'b': 4}

dict1 = {'a': 1, 'b': 2}
dict2 = {'a': 1, 'b': 2, 'd': 5}

# Output: {'a': 1, 'b': 2}

key that's present in dict1 but not in dict2, output as shown below

dict1 = {'a': 1, 'b': 2, 'c': 6}
dict2 = {'a': 3, 'b': 4, 'd': 5}

# Output: {'c': 6}

all the keys of dict1 exist in dict2, get values of keys of dict1 from dict2

dict1 = {'a': 1, 'b': 2, 'c': 6}
dict2 = {'a': 3, 'b': 4, 'c': 6, 'd': 5}

# Output: {'a': 3, 'b': 4, 'c': 6}

Tried below methods:

def keys_in_dict1_but_not_in_dict2(dict1, dict2):
   d1 = {}
   for key in dict1.keys():
      if not key in dict2:
         d1[key] = dict1[key]

   return d1

def keys_in_dict1_and_dict2(dict1, dict2):
      d1 = {}
      for key in dict1.keys():
         if key in dict2:
            d1[key] = dict2[key]

      return d1

I could have used sets, but that's slower when dictionary length increases. And this above conventional looping may increase the time (along with complexity), as dictionary length increases. What would be efficient and best way to handle ? Is this the right approach or any other better approach to handle these scenarios ?

CodePudding user response：

Try using dictionary comprehension.

get the values of keys of dict1 from dict2:

part1 = {key: dict2[key] for key in dict1 if key in dict2}
key that's present in dict1 but not in dict2:

part2 = {key: value for key, value in dict1.items() if key not in dict2}
get values of keys of dict1 from dict2"

part3 = {key: dict2[key] for key in dict1}

CodePudding user response：

The problem is as follows: iterate through the keys of one dictionary, decide if they are in another, and extract the appropriate values. You can either do both steps in one loop, or separate the key checking from the value extraction. The choice will depend on the number of keys and the number of values you want to copy.

To do the operations together, you would use the loops you show, possibly optimized as comprehensions. To separate out the operations, you would use the fact that dict.keys() returns a set-like view backed by the dictionary itself. This allows you to do selection much faster than converting to a set.

Keys of dict1 that appear in dict2 are dict1.keys() & dict2.keys(). Your two options are therefore

{key: dict2[key] for key in dict1.keys() & dict2.keys()}

AND

{key: dict2[key] for key in dict1 if key in dict2}

Keys of dict1 that don't appear in dict2 are dict1.keys() - dict2.keys(). Your two options are therefore

{key: dict1[key] for key in dict1.keys() - dict2.keys()}

AND

{key: dict1[key] for key in dict1 if key not in dict2}

In this case, no optimization is possible: all the keys need to be iterated over:
```
{key: dict2[key] for key in dict1}
```

To find out which option works fastest for 1. and 2., you will have to run a benchmark specific to your data size and ratio of overlap. For example, if len(dict1.keys() & dict2.keys()) < len(dict1), you will save a lot of overhead in __getitem__ by using the first approach for 1. But if most keys are present, then the overhead of doing a set operation on keys() may overtake the computation.