I am trying to follow chapter 3 of Hands-On Machine Learning with Scikit-Learn and TensorFlow for classification of MNIST data. The command runs as follows in Jupyter notebook:
>>> from sklearn.datasets import fetch_openml
>>> mnist = fetch_openml('mnist_784', version=1)
>>> mnist.keys()
dict_keys(['data', 'target', 'feature_names', 'DESCR', 'details',
'categories', 'url'])
>>> X, y = mnist["data"], mnist["target"]
>>> X.shape
(70000, 784)
>>> y.shape
(70000,)
The following command throws error
>>> some_digit = X[0]
Error message:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
~/anaconda3/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
~/anaconda3/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-43-348a6e96ae02> in <module>
----> 1 some_digit = X[0]
~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
3456 if self.columns.nlevels > 1:
3457 return self._getitem_multilevel(key)
-> 3458 indexer = self.columns.get_loc(key)
3459 if is_integer(indexer):
3460 indexer = [indexer]
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 0
It is hard for me understand what the actual error is as I have not come across similar one for such a simple assignment. What is causing the issue?
CodePudding user response:
X is a dataframe so if you use X[0], it means you are looking for a column named "0". If you want the first row (index) of your dataframe, you have to use .loc or .iloc. In your case both methods are equivalent (only) because the index is numeric, start from 0 and continuous:
# Extract the first row as a Series
>>> X.loc[0]
pixel1 0.0
pixel2 0.0
pixel3 0.0
pixel4 0.0
pixel5 0.0
...
pixel780 0.0
pixel781 0.0
pixel782 0.0
pixel783 0.0
pixel784 0.0
Name: 0, Length: 784, dtype: float64
# Extract a pixel by label
>>> X.loc[0, 'pixel7']
0.0
# Extract the same pixel by position
>>> X.iloc[0, 6]
0.0
Update
Probably iloc would be more appropriate here
If you want to use iloc, prefer use numpy instead of pandas and convert data and target columns as array:
X, y = mnist["data"].to_numpy(), mnist["target"].to_numpy()
