I'm building a predictive model for whether a car is sport car or not. The model works fine, however I would like to join the predicted values back to the unique IDs and visualize the proportion, etc. Basically I have two dataframes:
- Testing with labelled data - test_cars
| CarId | Feature1 | Feature2 | IsSportCar |
|---|---|---|---|
| 1 | 90 | 150 | True |
| 2 | 60 | 200 | False |
| 3 | 560 | 500 | True |
- Unlabelled data to be predicted - cars_new
| CarId | Feature1 | Feature2 |
|---|---|---|
| 4 | 88 | 666 |
| 5 | 55 | 458 |
| 6 | 150 | 125 |
from sklearn.neighbors import KNeighborsClassifier
# Create arrays for the features and the response variable
y = test_cars['IsSportCar'].values
X = test_cars.drop(['IsSportCar','CarId'], axis=1).values
X_new = cars_new.drop(['CarId'], axis=1).values
# Create a k-NN classifier with 10 neighbors
knn = KNeighborsClassifier(n_neighbors=10)
# Fit the classifier to the data
knn.fit(X,y)
y_pred = knn.predict(X_new)
The model works fine, but I would like to join the predicted values back to each car (CarId), so the car_new dataframe would be outputted with predicted column "IsSportCar":
| CarId | Feature1 | Feature2 | IsSportCar |
|---|---|---|---|
| 4 | 88 | 666 | False |
| 5 | 55 | 458 | True |
| 6 | 150 | 125 | True |
Any ideas how to join the predicted values back to the unique IDs?
CodePudding user response:
cars_new['IsSportCar'] = y_pred
I assume y_pred is the variable you want to put into cars_new?
