Following is an example where the dataclass "fails" to be hashable but the normal class does produce a hash value fine.
Why does this work given that the normal class has the np.ndarray as well and should be unhashable?
import numpy as np
from dataclasses import dataclass
class X:
def __init__(self):
x = np.ndarray([0,1,1,0,110])
@dataclass(frozen=True, eq=True)
class Y:
name: str
unit_price: np.ndarray
a = X()
print(a.__hash__())
b = Y(10, np.array([0,1,2]))
print(b.__hash__())
Ouput:
8787234869218
Traceback (most recent call last):
File "t.py", line 37, in <module>
print(b.__hash__())
File "<string>", line 3, in __hash__
TypeError: unhashable type: 'numpy.ndarray'
CodePudding user response:
Because you used:
@dataclass(frozen=True, eq=True)
The rules here are described in the docs:
If
eqandfrozenare both true, by defaultdataclass()will generate a__hash__()method for you. Ifeqis true andfrozenis false,__hash__()will be set to None, marking it unhashable (which it is, since it is mutable). Ifeqis false,__hash__()will be left untouched meaning the__hash__()method of the superclass will be used (if the superclass isobject, this means it will fall back to id-based hashing).
The dataclass generated a hash function that is hashing based on the attributes. Of course, it doesn't actually work because you used a numpy.ndarray.
By default, user-defined objects inherit object.__hash__, which just hashes based on identity, pretty much return id(self) (although not exactly).
The two won't behave the same, in the first case, because you told the dataclass code generator to make an "immutable" type with a corresponding __eq__ that is based on the attributes of the object. The hash, of course, is consistent with the __eq__ and is based on the values of the attributes. In the second case, the hash (and equality) is based on the identity of the object.
To illustrate these differences:
>>> from dataclasses import dataclass
>>> @dataclass(frozen=True, eq=True)
... class Point1:
... x: int
... y: int
...
>>> points = set()
>>> points.add(Point1(0, 0))
>>> Point1(0, 0) in points
True
So notice, a different point object with the same value was in the set.
However, here is how identity based hashing/equality would function:
>>> class Point2:
... def __init__(self, x, y):
... self.x = x
... self.y = y
...
>>> p = Point2(0, 0)
>>> points = set()
>>> points.add(p)
>>> Point2(0, 0) in points
False
>>> p in points
True
