Im learning from a python course from tech by tim, I used his code for a basic Linear Regression algorithm and tried to make my own small data sheet to try it out, whenever I import my data sheet it shows the following error:
KeyError: 'xvalue'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
4 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2898 return self._engine.get_loc(casted_key)
2899 except KeyError as err:
-> 2900 raise KeyError(key) from err
2901
2902 if tolerance is not None:
KeyError: 'xvalue'
my code is as follows:
from __future__ import absolute_import, division, print_function, unicode_literals
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import clear_output
from six.moves import urllib
import tensorflow.compat.v2.feature_column as fc
import tensorflow as tf
# Load dataset.
train = "https://docs.google.com/spreadsheets/d/1qQfNL2ePWVsOiqSNgJu_ZwmI9KFwFTrtdb1boqIKFZQ/edit?usp=sharing"
eval = "https://docs.google.com/spreadsheets/d/1Ip1Key9NYx3boTUFkPUOzEWdtUBMeGJJEY3MbSpVkUo/edit?usp=sharing"
dftrain = pd.read_csv(train, sep='\t,\s*') # training data
dfeval = pd.read_csv(eval, sep='\t,\s*') # testing data
y_train = dftrain.pop('xvalue')
y_eval = dfeval.pop('xvalue')
I'm using the google python notebook to code this, any help would be greatly appreciated!
CodePudding user response:
If you want to export the contents of a Google Sheets spreadsheet, you can use the suggestion in this answer.
In your case, you're using Pandas, so you don't need to use the requests package.
So you can define this function:
def get_docs_url(doc_id):
url = f"https://docs.google.com/spreadsheet/ccc?key={doc_id}&output=csv"
return url
And use it like this:
train = get_docs_url('1qQfNL2ePWVsOiqSNgJu_ZwmI9KFwFTrtdb1boqIKFZQ')
dftrain = pd.read_csv(train) # training data
PS: I removed the sep='\t,\s*' part because it's not needed to parse a CSV file.
PPS: The test dataset has different column names than the train dataset. I needed to add dfeval = dfeval.rename(columns={'X': 'xvalue', 'Y': 'yvalue'}) to get it to work.
