Is there a way to use predict on a selection of rows from a pandas dataset? As an example:
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
clf.fit(X, y)
selection = [True, True, False, False, True, False]
data = pd.DataFrame.from_dict(
{
"A": [1, 5, 3, 6, 5, 7],
"B": ["a", "b", "a", "a", "b", "b"],
"c": [5, 7, 4, 6, 5, 2],
}
)
clf.predict(data[selection])
The idea is to use the predict method of the classifier only on the rows where selection is True while retaining the rows where selection is False as NaN. In this case the output should be something like:
[1, 0, NaN, NaN, 1, NaN]
Using clf.predict(data[selection]) I obviously get the results of the classifier but I lose the order of the original dataframe.
CodePudding user response:
You can try something like this:
data["selection"] = selection
selected_cols = data.columns[:-1]
def predict(x):
if x.selection:
return ("model.predict(x[selected_cols])") # call your model here
else:
return np.NAN
data.apply(predict, axis=1)
0 model.predict()
1 model.predict()
2 NaN
3 NaN
4 model.predict()
5 NaN
dtype: object
