I have a dataframe for which I predicted the result using XGBoost (all the necessary imports are made and I will not write them anymore):
studentId testId result Length Words picture
s1 t1 0 10 8.50 0
s1 t2 0 11 9.80 1
s1 t3 1 11 10.40 1
s2 t2 0 11 9.80 1
s2 t4 1 60 9.99 0
s3 t7 1 40 6.45 0
cols_to_drop = ['testId', 'studentId']
df.drop(cols_to_drop, axis=1, inplace=True)
X = df.drop('result', axis=1)
y = df['result']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=5)
model = XGBClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
I have a part of this dataframe for which I can also predict the result in a different way using surprise, not using all the above features:
studentId testId result
s1 t1 0
s1 t2 0
s1 t3 1
s2 t2 0
s2 t4 1
s3 t7 1
reader = Reader(rating_scale=(0, 1))
data = Dataset.load_from_df(df_small[['studentId', 'testId', 'result']], reader)
trainset, testset = train_test_split(data, test_size=0.25)
algo = KNNWithMeans()
algo.fit(trainset)
test = algo.test(testset)
test = pd.DataFrame(test)
test.drop("details", inplace=True, axis=1)
test.columns = ['userId', 'questionId', 'actual', 'cf_predictions']
Now, I want to create a model that combines the two and assigns different weights to each model. I tried to write the things above as functions and then everything as a big function:
def model_1(df):
cols_to_drop = ['testId', 'studentId']
new_df=df.drop(cols_to_drop, axis=1, inplace=True)
X = new_df.drop('result', axis=1)
y = new_df['result']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=5)
model = XGBClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
return y_test, y_pred
def model_2(df):
reader = Reader(rating_scale=(0, 1))
data = Dataset.load_from_df(df[['studentId', 'testId', 'result']], reader)
trainset, testset = train_test_split(data, test_size=0.25)
algo = KNNWithMeans()
algo.fit(trainset)
test = algo.test(testset)
test = pd.DataFrame(test)
test.drop("details", inplace=True, axis=1)
test.columns = ['studentId', 'testId', 'actual', 'cf_predictions']
return test
def merged_models(df):
first_model = model_1(df)
second_model = model_2(df)
prediction = 0.5 * first_model 0.5 * second_model # weights example
return prediction
The first two work, but merged_models(df) doesn't even get to apply model_1 because AttributeError: 'NoneType' object has no attribute 'drop' at X = new_df.drop('result', axis=1). The code is probably a mess, but is there any way of combining such two different models and being able to also evaluate this "hybrid"?
CodePudding user response:
df.drop does not return anything when inplace is set to True. It modifies the DataFrame in place and returns None. You don't need to create new names for them.
CodePudding user response:
As @TimRoberts pointed out, new_df.drop with inplace=True does not return anything (in other words, returns None). You can either leave inplace=False, or not reassign to new_df.
This will work:
new_df = df.drop(cols_to_drop, axis=1)
And so will this:
new_df = df.copy()
new_df.drop(cols_to_drop, axis=1, inplace=True)
