I am building a system that recommends a book from a dataset based on what is best for the user. The problem is that not only 1 book is returned to me, but a lot of them come out. How can I solve?
The code is this:
from sklearn.neighbors._classification import KNeighborsClassifier
import pandas as pd
class SuggestAudiobook:
def __init__(self, book):
model = KNeighborsClassifier()
book = pd.read_csv("dataset.csv", delimiter = ";")
var2 = book.Title
var1 = book[["audioRuntime_converted", "category_converted"]]
var2 = var2.astype('string')
var1 = var1.astype('int')
model.fit(var1, var2)
dataframe = pd.DataFrame(data = {"audioRuntime_converted": book.audioRuntime_converted, "category_converted": book.category_converted})
predictionDataframe = model.predict(dataframe)
print("L'audiobook recommended for you is --> ", predictionDataframe)
The result is this:
audiobook recommended for you is' --> ['Catching Fire' 'In Charge of Moonlight' 'Catching Fire' ... 'Born a Crime' 'Born a Crime' 'Born a Crime']
I attach the images of the result obtained:

I'm going to recommend a book among those included in the dataset based on the data inputs. In this case the data inputs are: audioRuntime_converted and category_converted (they are found in the other file that calls the function). Then in the dataset I go to search based on those 2 fields. I am sure that the procedure is correct as applied in another project, only problem is the output which gives me more values instead of one.
CodePudding user response:
You have multiple lines in your dataframe, the .predict() function will run for every line of your dataset.
So len(predictionDataframe) == len(dataframe)
CodePudding user response:
Depending on what the input in model.predict(input), the prediction will be done for each record in the input. In your code, you seem to have input the training dataset to make a prediction, so the output is also a list of books, likely the same number of rows as the training label (var2).
I have simulated some (quite obvious) dataset for the prediction
from sklearn.neighbors._classification import KNeighborsClassifier
import pandas as pd
import numpy as np
# book = pd.read_csv("dataset.csv", delimiter = ";")
df1 = pd.concat([pd.DataFrame(np.random.uniform(0, 10, (5,2))), pd.DataFrame(['Book A']*5)], axis=1)
df2 = pd.concat([pd.DataFrame(np.random.uniform(5, 15, (5,2))), pd.DataFrame(['Book B']*5)], axis=1)
book = pd.concat([df1, df2])
book.columns = ['audioRuntime_converted', 'category_converted', 'Title']
print(book)
audioRuntime_converted category_converted Title
0 3.180352 1.995319 Book A
1 5.928537 9.304618 Book A
2 3.445036 5.746906 Book A
3 3.623655 2.043251 Book A
4 8.340740 9.641824 Book A
0 7.224949 7.158453 Book B
1 9.191920 10.732677 Book B
2 7.417375 6.956461 Book B
3 10.274473 14.435836 Book B
4 5.945386 13.222845 Book B
Next I do the training and prediction:
var1 = book[["audioRuntime_converted", "category_converted"]].astype('int').values #this is X_train
var2 = book.Title.astype('string') #this is y_train
model = KNeighborsClassifier()
model.fit(var1, var2)
test_list = [ [1,3], [3,6], [9,7], [10,12] ] #list of user attributes [x,y]
for user in test_list:
prediction = model.predict([user]) #input 1 user to get 1 book recommendation
print(f"L'audiobook recommended for user {user} is --> {prediction}")
Output:
L'audiobook recommended for user [1, 3] is --> ['Book A']
L'audiobook recommended for user [3, 6] is --> ['Book A']
L'audiobook recommended for user [9, 7] is --> ['Book B']
L'audiobook recommended for user [10, 12] is --> ['Book B']
As you can see, if a user has low attributes [x,y], the recommended book is "Book A", whereas if a user has higher attributes [x,y], the recommended book is "Book B".
Also, for the input in model.predict(input), an input of 1 user attribute pair (for example [1,3]) gets 1 book recommendation.
Edit: I'm comparing difference between the code above and your other code
pd.DataFrame(data={"audioRuntime_converted": book.audioRuntime_converted, "category_converted": book.category_converted })
pd.DataFrame(data={"audioRuntime_converted": [this, is, already, series], "category_converted": [this, is, also, series]})
#that's why output is a series of prediction
pd.DataFrame(data={"audioRuntime_converted":[book.audioRuntime_converted], "average_rating_converted":[book.average_rating_converted], "ratings_count_converted":[book.ratings_count_converted]})
pd.DataFrame(data={"audioRuntime_converted":[there is single number here], "average_rating_converted":[ there is single number here ], "ratings_count_converted":[there isa single number here]})
#that's why there is only 1 prediction
