Home > Net >  ValueError: could not convert string to float: 'Virus'
ValueError: could not convert string to float: 'Virus'

Time:02-01

How do I fix the problem below? The data is video games sales available on Kaggle.

import pandas as pd

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

df = pd.read_csv('vgsales.csv') X = df.drop(columns=['NA_Sales' , 'EU_Sales' , 'JP_Sales' , 'Other_Sales' , 'Global_Sales'])

y = df.drop(columns= ['Rank' , 'Year' , 'Platform' , 'Year' , 'Genre' , 'Publisher' ])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = DecisionTreeClassifier()

model.fit(X_train, y_train)

prediction = model.predict(X_test)

CodePudding user response:

The error clearly states it could not convert a string to float since it does not represent a number. You'll have to do some data validation probably. Where does the error occur exactly?

CodePudding user response:

The problem is clearly not the exception. You use ML without explain what you really want. DecisionTreeClassifier is a classifier. So with input data, the model try to determine the class of the input data.

If I load your data:

>>> X.columns  # the input data (features)
['Rank', 'Name', 'Platform', 'Year', 'Genre', 'Publisher']

>>> y.columns  # the output data (target)
['Name', 'NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales', 'Global_Sales']

X are not prepared to machine learning and y does not look like a target, so your data is unusable.

So what do you want to find with this dataset?

CodePudding user response:

import pandas as pd

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

df = pd.read_csv('vgsales.csv') X = df.drop(columns=['NA_Sales' , 'EU_Sales' , 'JP_Sales' , 'Other_Sales' , 'Global_Sales'])

y = df.drop(columns= ['Rank' , 'Year' , 'Platform' , 'Year' , 'Genre' , 'Publisher' ])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = DecisionTreeClassifier()

model.fit(X_train, y_train)

prediction = model.predict(X_test)

gives

df = pd.read_csv('vgsales.csv') X = df.drop(columns=['NA_Sales' , 'EU_Sales' , 'JP_Sales' , 'Other_Sales' , 'Global_Sales'])
                                    ^
SyntaxError: invalid syntax

One solution is to simply put X = df.drop(columns=['NA_Sales' , 'EU_Sales' , 'JP_Sales' , 'Other_Sales' , 'Global_Sales']) in a new line.

Then convert string to float using float() function.

  •  Tags:  
  • Related