Python version: 3.6.9
I've used pickle to dump a machine learning model into a file, and when I try to run a prediction on it using Flask, it fails with ModuleNotFoundError: No module named 'predictors'. How can I fix this error so that it recognizes my model, whether I try to run a prediction via Flask or via the Python command (e.g. python predict_edu.py)?
Here is my file structure:
- video_discovery
__init__.py
- data_science
- model
- __init__.py
- predict_edu.py
- predictors.py
- train_model.py
Here's my predict_edu.py file:
import pickle
with open('model', 'rb') as f:
bow_model = pickle.load(f)
Here's my predictors.py file:
from sklearn.base import TransformerMixin
# Basic function to clean the text
def clean_text(text):
# Removing spaces and converting text into lowercase
return text.strip().lower()
# Custom transformer using spaCy
class predictor_transformer(TransformerMixin):
def transform(self, X, **transform_params):
# Cleaning Text
return [clean_text(text) for text in X]
def fit(self, X, y=None, **fit_params):
return self
def get_params(self, deep=True):
return {}
Here's how I train my model:
python data_science/train_model.py
Here's my train_model.py file:
from predictors import predictor_transformer
# pipeline = Pipeline([("cleaner", predictor_transformer()), ('vectorizer', bow_vector), ('classifier', classifier_18p)])
pipeline = Pipeline([("cleaner", predictor_transformer())])
with open('model', 'wb') as f:
pickle.dump(pipeline, f)
My Flask app is in: video_discovery/__init__.py
Here's how I run my Flask app:
FLASK_ENV=development FLASK_APP=video_discovery flask run
I believe the issue may be occurring because I'm training the model by running the Python script directly instead of using Flask, so there might be some namespace issues, but I'm not sure how to fix this. It takes a while to train my model, so I can't exactly wait on an HTTP request.
What am I missing that might fix this issue?
CodePudding user response:
It seems a bit strange that you get that error when executing predict_edu.py, as it is in the same directory as predictors.py, and thus, using absolute import such as from predictors import predictor_transformer (without the dot . operator) should normally work as expected. However, below are a few options that you could try out, if the error persists.
Option 1
You could add the parent directory of the predictors file to the system PATH variable, before attempting to import the module, as described here.
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent))
from predictors import predictor_transformer
Option 2
Use relative imports, e.g., from .predictors import..., and make sure you run the script outside the project's directory, as shown below. The -m option "searches the sys.path for the named module and execute its contents as the __main__ module", and not as the top-level script. Read more about the -m option in the following references: [1], [2], [3], [4], [5], [6]. Read more about "relative imports" here: [1], [2], [3], [4].
python -m video_discovery.data_science.predict_edu
However, the PEP 8 style guide recommends using absolute imports in general.
Absolute imports are recommended, as they are usually more readable and tend to be better behaved (or at least give better error messages) if the import system is incorrectly configured (such as when a directory inside a package ends up on sys.path)
In certain cases, however, absolute imports can get quite verbose, depending on the complexity of the directory structure, as shown below. On the other hand, "relative imports can be messy, particularly for shared projects where directory structure is likely to change". They are also "not as readable as absolute ones, and it is hard to tell the location of the imported resources". Read more about Python Import and Absolute vs Relative Imports.
from package1.subpackage2.subpackage3.subpackage4.module5 import function6
Option 3
Include the directory containing your package directory in PYTHONPATH and use absolute imports instead. PYTHONPATH is used to set the path for user-defined modules, so that they can be directly imported into a Python script. The PYTHONPATH variable is a string with a list of directories that need to be added to the sys.path directory list by Python. The primary use of this variable is to allow users to import modules that have not yet made into an installable Python package. Read more about it here and here.
For instance, let’s say you wanted add the directory /Users/my_user/code to the PYTHONPATH:
On Mac
- Open
Terminal.app - Open the file
~/.bash_profilein your text editor – e.g.atom ~/.bash_profile - Add the following line to the end:
export PYTHONPATH="/Users/my_user/code" - Save the file.
- Close
Terminal.app - Start
Terminal.appagain, to read in the new settings, and typeecho $PYTHONPATH. It should show something like/Users/my_user/code.
On Linux
Open your favorite terminal program
Open the file
~/.bashrcin your text editor – e.g.atom ~/.bashrcAdd the following line to the end:
export PYTHONPATH=/home/my_user/codeSave the file.
Close your terminal application.
Start your terminal application again, to read in the new settings, and type
echo $PYTHONPATH. It should show something like/home/my_user/code.
On Windows
- Open
This PC(orComputer), right-click inside and selectProperties. - From the computer properties dialog, select
Advanced system settingson the left. - From the advanced system settings dialog, choose the
Environment variablesbutton. - In the Environment variables dialog, click the
Newbutton in the top half of the dialog, to make a new user variable: - Give the variable name as
PYTHONPATHand in value add the path to your module directory. ChooseOKandOKagain to save this variable. - Now open a cmd window and type
echo %PYTHONPATH%to confirm the environment variable is correctly set. Remember to open a new cmd window to run your Python program, so that it picks up the new settings inPYTHONPATH.
Option 4
Another solution would be to install the package in an editable state (all edits made to the .py files will be automatically included in the installed package), as described here and here. However, the amount of work required to get this to work might make Option 3 a better choice for you.
CodePudding user response:
From https://docs.python.org/3/library/pickle.html:
picklecan save and restore class instances transparently, however the class definition must be importable and live in the same module as when the object was stored.
When you run python data_science/train_model.py and import from predictors, Python imports predictors as a top-level module and predictor_transformer is in that module.
However, when you run a prediction via Flask from the parent folder of video_discovery, predictor_transformer is in the video_discovery.data_science.predictors module.
Use relative imports and run from a consistent path
train_model.py: Use relative import
# from predictors import predictor_transformer # -
from .predictors import predictor_transformer #
Train model: Run train_model with video_discovery as top-level module
# python data_science/train_model.py # -
python -m video_discovery.data_science.train_model #
Run a prediction via a Python command: Run predict_edu with video_discovery as top-level module
# python predict_edu.py # -
python -m video_discovery.data_science.predict_edu #
Run a prediction via Flask: (no change, already run with video_discovery as top-level module)
FLASK_ENV=development FLASK_APP=video_discovery flask run
