Home > Software engineering >  What is the hash function for a normal class and why don't dataclasses have the same hash funct
What is the hash function for a normal class and why don't dataclasses have the same hash funct

Time:02-10

Following is an example where the dataclass "fails" to be hashable but the normal class does produce a hash value fine.

Why does this work given that the normal class has the np.ndarray as well and should be unhashable?

import numpy as np
from dataclasses import dataclass

class X:
    def __init__(self):
        x = np.ndarray([0,1,1,0,110]) 

@dataclass(frozen=True, eq=True)
class Y:
    name: str
    unit_price: np.ndarray 

a = X()
print(a.__hash__()) 

b = Y(10, np.array([0,1,2]))
print(b.__hash__())

Ouput:

8787234869218
Traceback (most recent call last):
  File "t.py", line 37, in <module>
    print(b.__hash__())
  File "<string>", line 3, in __hash__
TypeError: unhashable type: 'numpy.ndarray'

CodePudding user response:

Because you used:

@dataclass(frozen=True, eq=True)

The rules here are described in the docs:

If eq and frozen are both true, by default dataclass() will generate a __hash__() method for you. If eq is true and frozen is false, __hash__() will be set to None, marking it unhashable (which it is, since it is mutable). If eq is false, __hash__() will be left untouched meaning the __hash__() method of the superclass will be used (if the superclass is object, this means it will fall back to id-based hashing).

The dataclass generated a hash function that is hashing based on the attributes. Of course, it doesn't actually work because you used a numpy.ndarray.

By default, user-defined objects inherit object.__hash__, which just hashes based on identity, pretty much return id(self) (although not exactly).

The two won't behave the same, in the first case, because you told the dataclass code generator to make an "immutable" type with a corresponding __eq__ that is based on the attributes of the object. The hash, of course, is consistent with the __eq__ and is based on the values of the attributes. In the second case, the hash (and equality) is based on the identity of the object.

To illustrate these differences:

>>> from dataclasses import dataclass
>>> @dataclass(frozen=True, eq=True)
... class Point1:
...     x: int
...     y: int
...
>>> points = set()
>>> points.add(Point1(0, 0))
>>> Point1(0, 0) in points
True

So notice, a different point object with the same value was in the set.

However, here is how identity based hashing/equality would function:

>>> class Point2:
...     def __init__(self, x, y):
...         self.x = x
...         self.y = y
...
>>> p = Point2(0, 0)
>>> points = set()
>>> points.add(p)
>>> Point2(0, 0) in points
False
>>> p in points
True
  •  Tags:  
  • Related