I'm trying to run this code on Google colab, I got the ValueError: x and y must be the same size, I've tried multiple ways but none of them worked.
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width','class']
df = pd.read_csv('Iris.data.csv', header=None, names=columns)
X = np.array(df.iloc[:, 0:4])
y = np.array(df['class'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)
plt.scatter(X,y)
plt.show()
CodePudding user response:
I think your problem might be that you haven't thought enough about what kind of scatter plot you are trying to make. The X array contains 150 samples of 4 parameters. The y array contains the classes of each data sample. How were you expecting the scatter plot to look? Remember that a scatter plot can only plot data in two dimensions, not 4!
I don't know what your 'Iris.data.csv' file contains but I used the copy of the iris data set from scikit-learn 
