I was studying the PyTorch's Dataset class. From what I knew beforehand, we need to inherit from torch.utils.data.Dataset everytime we create a CustomDataset class of our own; and further we need to override the __len__ and __getitem__ methods as per need.
But, I got to know that it isn't always necessary to inherit and we could go on to create our CustomDataset class with __len__ and __getitem__ methods without inheriting from torch.utils.data.Dataset and even then the behaviour of an instance of custom dataset remains pretty much same (I tested it myself).
That to say, len(cust_data) would return the length of the dataset we pass while creating our cust_data instance, and we could even index cust_data like cust_data[0] and it would return what's returned by __getitem__ method in our CustomDataset class.
My questions are -
What is the need of inheriting when we are just as fine without inheriting - and if we are not, what functionality do we miss on if we do not inherit? When is inheriting recommended and when is it not? (While the official docs recommend to inherit, always)
When not inheriting, how did the instance know it needs to call the
__getitem__method when it is indexed?
Any answers appreciated.
CodePudding user response:
Take a look at the source code for
torch.utils.data.Dataset- it is an abstract class, which guarantees that every class inherting it must implement__getitem__. In other words, you don't "need" to inheritDataset: as long as__getitem__is properly implemented your dataset class will work fine. The reason of why doing so has become common practice is that it indicates to a third-party (e.g. some other code that uses your dataset class, someone else reading your code) that the class in question has__getitem__implemented. It provides a common interface for PyTorch datasets.Executing
someClass[i]will automatically callsomeClass.__getitem__with parameteri(and will throw an error if__getitem__is not implemented). This is a Python built-in feature and has nothing to do with whichever base class you are inheriting. You can Google "dunder methods" to learn more about these special behaviors.
