Debug scoring scripts with Azure Machine Learning inference HTTP server
The Azure Machine Learning inference HTTP server is a Python package that exposes your scoring function as an HTTP endpoint and wraps the Flask server code and dependencies into a singular package. The server is included in the prebuilt Docker images for inference that are used when deploying a model with Azure Machine Learning. Using the package alone, you can deploy the model locally for production, and easily validate your scoring (entry) script in a local development environment. If there's a problem with the scoring script, the server returns an error and the location of the error.
The server can also be used to create validation gates in a continuous integration and deployment pipeline. For example, you can start the server with the candidate script and run the test suite against the local endpoint.
This article supports developers who want to use the inference server to debug locally and describes how to use the inference server with online endpoints on Windows.
Prerequisites
To use the Azure Machine Learning inference HTTP server for local debugging, your configuration must include the following components:
- Python 3.8 or later
- Anaconda
The Azure Machine Learning inference HTTP server runs on Windows and Linux based operating systems.
Explore local debugging options for online endpoints
By debugging endpoints locally before you deploy to the cloud, you can catch errors in your code and configuration earlier. To debug endpoints locally, you have several options, including:
- The Azure Machine Learning inference HTTP server
- A local endpoint
This article describes how to work with the Azure Machine Learning inference HTTP server on Windows.
The following table provides an overview of scenarios to help you choose the best option:
Scenario | Inference HTTP server | Local endpoint |
---|---|---|
Update local Python environment without Docker image rebuild | Yes | No |
Update scoring script | Yes | Yes |
Update deployment configurations (deployment, environment, code, model) | No | Yes |
Integrate Microsoft Visual Studio Code (VS Code) Debugger | Yes | Yes |
When you run the inference HTTP server locally, you can focus on debugging your scoring script without concern for deployment container configurations.
Install azureml-inference-server-http package
To install the azureml-inference-server-http
package, run the following command:
python -m pip install azureml-inference-server-http
Note
To avoid package conflicts, install the inference HTTP server in a virtual environment.
You can use the pip install virtualenv
command to enable virtual environments for your configuration.
Debug your scoring script locally
To debug your scoring script locally, you have several options for testing the server behavior:
- Try a dummy scoring script.
- Use Visual Studio Code to debug with the azureml-inference-server-http package.
- Run an actual scoring script, model file, and environment file from our examples repo.
Test server behavior with dummy scoring script
Create a directory named server_quickstart to hold your files:
mkdir server_quickstart cd server_quickstart
To avoid package conflicts, create a virtual environment, such as myenv, and activate it:
python -m virtualenv myenv
Note
On Linux, run the
source myenv/bin/activate
command to activate the virtual environment.After you test the server, you can run the
deactivate
command to deactivate the Python virtual environment.Install the
azureml-inference-server-http
package from the pypi feed:python -m pip install azureml-inference-server-http
Create your entry script. The following example creates a basic entry script and saves it to a file named score.py:
echo -e "import time def init(): \n\t time.sleep(1) \n\n def run(input_data): \n\t return {"message":"Hello, World!"}" > score.py
Start the server with the
azmlinfsrv
command and set the score.py file as the entry script:azmlinfsrv --entry_script score.py
Note
The server is hosted on 0.0.0.0, which means it listens to all IP addresses of the hosting machine.
Send a scoring request to the server by using the
curl
utility:curl -p 127.0.0.1:5001/score
The server posts the following response:
{"message": "Hello, World!"}
After testing, select Ctrl + C to terminate the server.
Now you can modify the scoring script file (score.py) and test your changes by running the server again with the azmlinfsrv --entry_script score.py
command.
Integrate with Visual Studio Code
To use VS Code and the Python Extension for debugging with the azureml-inference-server-http package, you can use the Launch and Attach modes.
For Launch mode, set up the launch.json file in VS Code and start the Azure Machine Learning inference HTTP server within VS Code:
Start VS Code and open the folder containing the script (score.py).
Add the following configuration to the launch.json file for that workspace in VS Code:
launch.json
{ "version": "0.2.0", "configurations": [ { "name": "Debug score.py", "type": "python", "request": "launch", "module": "azureml_inference_server_http.amlserver", "args": [ "--entry_script", "score.py" ] } ] }
Start the debugging session in VS Code by selecting Run > Start Debugging or use the keyboard shortcut F5.
For Attach mode, start the Azure Machine Learning inference HTTP server in a command window, and use VS Code with the Python Extension to attach to the process:
Note
For Linux, first install the
gdb
package by running thesudo apt-get install -y gdb
command.Add the following configuration to the launch.json file for that workspace in VS Code:
launch.json
{ "version": "0.2.0", "configurations": [ { "name": "Python: Attach using Process Id", "type": "python", "request": "attach", "processId": "${command:pickProcess}", "justMyCode": true } ] }
In a command window, start the inference HTTP server by using the
azmlinfsrv --entry_script score.py
command.Start the debugging session in VS Code:
Select Run > Start Debugging or use the keyboard shortcut F5.
In the command window, view the logs from the inference server and locate the process ID of the
azmlinfsrv
command (not thegunicorn
):In the VS Code Debugger, enter the process ID of the
azmlinfsrv
command.If you don't see the VS Code process picker, you can manually enter the process ID in the
processId
field of the launch.json file for that workspace.
For both modes, you can set breakpoints and debug the script step by step.
Use an end-to-end example
The following procedure runs the server locally with sample files (scoring script, model file, and environment) from the Azure Machine Learning example repository. For more examples of how to use these sample files, see Deploy and score a machine learning model by using an online endpoint.
Clone the sample repository:
git clone --depth 1 https://github.com/Azure/azureml-examples cd azureml-examples/cli/endpoints/online/model-1/
Create and activate a virtual environment with conda:
In this example, the
azureml-inference-server-http
package is automatically installed. The package is included as a dependent library of theazureml-defaults
package in the conda.yml file:# Create the environment from the YAML file conda env create --name model-env -f ./environment/conda.yml # Activate the new environment conda activate model-env
Review your scoring script:
onlinescoring/score.py
import os
import logging
import json
import numpy
import joblib
def init():
"""
This function is called when the container is initialized/started, typically after create/update of the deployment.
You can write the logic here to perform init operations like caching the model in memory
"""
global model
# AZUREML_MODEL_DIR is an environment variable created during deployment.
# It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
# Please provide your model's folder name if there is one
model_path = os.path.join(
os.getenv("AZUREML_MODEL_DIR"), "model/sklearn_regression_model.pkl"
)
# deserialize the model file back into a sklearn model
model = joblib.load(model_path)
logging.info("Init complete")
def run(raw_data):
"""
This function is called for every invocation of the endpoint to perform the actual scoring/prediction.
In the example we extract the data from the json input and call the scikit-learn model's predict()
method and return the result back
"""
logging.info("model 1: request received")
data = json.loads(raw_data)["data"]
data = numpy.array(data)
result = model.predict(data)
logging.info("Request processed")
return result.tolist()
Run the inference HTTP server by specifying the scoring script and model file: The model directory specified in the
model_dir
parameter is defined by using theAZUREML_MODEL_DIR
variable and retrieved in the scoring script.In this case, you specify the current directory ./ because the subdirectory is specified in the scoring script as model/sklearn_regression_model.pkl.
azmlinfsrv --entry_script ./onlinescoring/score.py --model_dir ./
When the server launches and successfully invokes the scoring script, the example startup log opens. Otherwise, the log shows error messages.
Test the scoring script with sample data:
Open another command window and change into the same working directory where you run the command.
Use the
curl
utility to send an example request to the server and receive a scoring result:curl --request POST "127.0.0.1:5001/score" --header "Content-Type:application/json" --data @sample-request.json
When there are no problems in your scoring script, the script returns the scoring result. If problems occur, you can try to update the scoring script, and launch the server again to test the updated script.
Review server routes
The inference HTTP server listens on port 5001 by default at the following routes:
Name | Route |
---|---|
Liveness Probe | 127.0.0.1:5001/ |
Score | 127.0.0.1:5001/score |
OpenAPI (swagger) | 127.0.0.1:5001/swagger.json |
Review server parameters
The inference HTTP server accepts the following parameters:
Parameter | Required | Default | Description |
---|---|---|---|
entry_script |
True | N/A | Identifies the relative or absolute path to the scoring script. |
model_dir |
False | N/A | Identifies the relative or absolute path to the directory that holds the model used for inferencing. |
port |
False | 5001 | Specifies the serving port of the server. |
worker_count |
False | 1 | Provides the number of worker threads to process concurrent requests. |
appinsights_instrumentation_key |
False | N/A | Provides the instrumentation key to the application insights where the logs are published. |
access_control_allow_origins |
False | N/A | Enables CORS for the specified origins, where multiple origins are separated by a comma (,), such as microsoft.com, bing.com . |
Explore server request processing
The following steps demonstrate how the Azure Machine Learning inference HTTP server (azmlinfsrv
) handles incoming requests:
A Python CLI wrapper sits around the server's network stack and is used to start the server.
A client sends a request to the server.
The server sends the request through the Web Server Gateway Interface (WSGI) server, which dispatches the request to a Flask worker application:
The Flask worker app handles the request, which includes loading the entry script and any dependencies.
Your entry script receives the request. The entry script makes an inference call to the loaded model and returns a response:
Explore server logs
There are two ways to obtain log data for the inference HTTP server test:
- Run the
azureml-inference-server-http
package locally and view the logs output. - Use online endpoints and view the container logs. The log for the inference server is named Azure Machine Learning Inferencing HTTP server <version>.
Note
The logging format has changed since version 0.8.0. If your log uses a different style than expected, update the azureml-inference-server-http
package to the latest version.
View startup logs
When the server starts, the logs show the initial server settings as follows:
Azure Machine Learning Inferencing HTTP server <version>
Server Settings
---------------
Entry Script Name: <entry_script>
Model Directory: <model_dir>
Worker Count: <worker_count>
Worker Timeout (seconds): None
Server Port: <port>
Application Insights Enabled: false
Application Insights Key: <appinsights_instrumentation_key>
Inferencing HTTP server version: azmlinfsrv/<version>
CORS for the specified origins: <access_control_allow_origins>
Server Routes
---------------
Liveness Probe: GET 127.0.0.1:<port>/
Score: POST 127.0.0.1:<port>/score
<logs>
For example, when you launch the server by following the end-to-end example, the log displays as follows:
Azure Machine Learning Inferencing HTTP server v0.8.0
Server Settings
---------------
Entry Script Name: /home/user-name/azureml-examples/cli/endpoints/online/model-1/onlinescoring/score.py
Model Directory: ./
Worker Count: 1
Worker Timeout (seconds): None
Server Port: 5001
Application Insights Enabled: false
Application Insights Key: None
Inferencing HTTP server version: azmlinfsrv/0.8.0
CORS for the specified origins: None
Server Routes
---------------
Liveness Probe: GET 127.0.0.1:5001/
Score: POST 127.0.0.1:5001/score
2022-12-24 07:37:53,318 I [32726] gunicorn.error - Starting gunicorn 20.1.0
2022-12-24 07:37:53,319 I [32726] gunicorn.error - Listening at: http://0.0.0.0:5001 (32726)
2022-12-24 07:37:53,319 I [32726] gunicorn.error - Using worker: sync
2022-12-24 07:37:53,322 I [32756] gunicorn.error - Booting worker with pid: 32756
Initializing logger
2022-12-24 07:37:53,779 I [32756] azmlinfsrv - Starting up app insights client
2022-12-24 07:37:54,518 I [32756] azmlinfsrv.user_script - Found user script at /home/user-name/azureml-examples/cli/endpoints/online/model-1/onlinescoring/score.py
2022-12-24 07:37:54,518 I [32756] azmlinfsrv.user_script - run() is not decorated. Server will invoke it with the input in JSON string.
2022-12-24 07:37:54,518 I [32756] azmlinfsrv.user_script - Invoking user's init function
2022-12-24 07:37:55,974 I [32756] azmlinfsrv.user_script - Users's init has completed successfully
2022-12-24 07:37:55,976 I [32756] azmlinfsrv.swagger - Swaggers are prepared for the following versions: [2, 3, 3.1].
2022-12-24 07:37:55,977 I [32756] azmlinfsrv - AML_FLASK_ONE_COMPATIBILITY is set, but patching is not necessary.
Understand log data format
All logs from the inference HTTP server, except for the launcher script, present data in the following format:
<UTC Time> | <level> [<pid>] <logger name> - <message>
The entry consists of the following components:
<UTC Time>
: Time when the entry was entered into the log.<pid>
: ID of the process associated with the entry.<level>
: First character of the logging level for the entry, such asE
for ERROR,I
for INFO, and so on.<logger name>
: Name of the resource associated with the log entry.<message>
: Contents of the log message.
There are six levels of logging in Python with assigned numeric values according to severity:
Logging level | Numeric value |
---|---|
CRITICAL | 50 |
ERROR | 40 |
WARNING | 30 |
INFO | 20 |
DEBUG | 10 |
NOTSET | 0 |
Troubleshoot server issues
The following sections provide basic troubleshooting tips for Azure Machine Learning inference HTTP server. To troubleshoot online endpoints, see Troubleshoot online endpoints deployment.
Check installed packages
Follow these steps to address issues with installed packages:
Gather information about installed packages and versions for your Python environment.
Confirm the
azureml-inference-server-http
Python package version specified in the environment file matches the Azure Machine Learning inference HTTP server version displayed in the startup log.In some cases, the pip dependency resolver installs unexpected package versions.
You might need to run
pip
to correct installed packages and versions.
If you specify the Flask or its dependencies in your environment, remove these items.
Dependent packages include
flask
,jinja2
,itsdangerous
,werkzeug
,markupsafe
, andclick
.flask
is listed as a dependency in the server package. The best approach is to allow the inference server to install theflask
package.When the inference server is configured to support new versions of Flask, the server automatically receives the package updates as they become available.
Check server version
The azureml-inference-server-http
server package is published to PyPI. The changelog and all previous versions are listed on the PyPI page.
If you use an earlier package version, the recommendation is to update your configuration to the latest version.
The following table summarizes stable versions, common issues, and recommended adjustments:
Package version | Description | Issue | Resolution |
---|---|---|---|
0.4.x | Bundled in training images dated 20220601 or earlier and azureml-defaults package versions .1.34 through 1.43 . Latest stable version is 0.4.13. |
For server versions earlier than 0.4.11, you might encounter Flask dependency issues, such as "can't import name Markup from jinja2." | Upgrade to version 0.4.13 or 0.8.x (the latest version), if possible. |
0.6.x | Preinstalled in inferencing images dated 20220516 and earlier. Latest stable version is 0.6.1. |
N/A | N/A |
0.7.x | Supports Flask 2. Latest stable version is 0.7.7. | N/A | N/A |
0.8.x | Log format changed. Python 3.6 support ended. | N/A | N/A |
Check package dependencies
The most relevant dependent packages for the azureml-inference-server-http
server package include:
flask
opencensus-ext-azure
inference-schema
If you specified the azureml-defaults
package in your Python environment, the azureml-inference-server-http
package is a dependendent package. The dependency is installed automatically.
Tip
If you use Python SDK v1 and don't explicitly specify the azureml-defaults
package in your Python environment, the SDK might automatically add the package. However, the packager version is locked relative to the SDK version. For example, if the SDK version is 1.38.0
, then the azureml-defaults==1.38.0
entry is added to the environment's pip requirements.
Frequently asked questions
The following sections describe possible resolutions for frequently asked questions about the Azure Machine Learning inference HTTP server.
TypeError during server startup
You might encounter a TypeError during server startup, as follows:
TypeError: register() takes 3 positional arguments but 4 were given
File "/var/azureml-server/aml_blueprint.py", line 251, in register
super(AMLBlueprint, self).register(app, options, first_registration)
TypeError: register() takes 3 positional arguments but 4 were given
This error occurs when you have Flask 2 installed in your Python environment, but your azureml-inference-server-http
package version doesn't support Flask 2. Support for Flask 2 is available in azureml-inference-server-http
package version 0.7.0 and later, and azureml-defaults
package version 1.44 and later.
If you don't use the Flask 2 package in an Azure Machine Learning docker image, use the latest version of the
azureml-inference-server-http
orazureml-defaults
packageIf you use the Flask 2 package in an Azure Machine Learning docker image, confirm the image build version is July 2022 or later.
You can find the image version in the container logs:
2022-08-22T17:05:02,147738763+00:00 | gunicorn/run | AzureML Container Runtime Information 2022-08-22T17:05:02,161963207+00:00 | gunicorn/run | ############################################### 2022-08-22T17:05:02,168970479+00:00 | gunicorn/run | 2022-08-22T17:05:02,174364834+00:00 | gunicorn/run | 2022-08-22T17:05:02,187280665+00:00 | gunicorn/run | AzureML image information: openmpi4.1.0-ubuntu20.04, Materialization Build:20220708.v2 2022-08-22T17:05:02,188930082+00:00 | gunicorn/run | 2022-08-22T17:05:02,190557998+00:00 | gunicorn/run |
The build date of the image appears after the
Materialization Build
notation. In the example, the image version is20220708
or July 8, 2022. In this example, the image is compatible with Flask 2.If you don't see a similar message in your container log, your image is out-of-date and should be updated. If you use a Compute Unified Device Architecture (CUDA) image, and you can't find a newer image, check if your image is deprecated in AzureML-Containers. You can find designated replacements for deprecated images.
If you use the server with an online endpoint, you can also find the logs under "Deployment logs" in the online endpoint page in Azure Machine Learning studio.
- If you deploy with SDK v1, and don't explicitly specify an image in your deployment configuration, the server applies the
openmpi4.1.0-ubuntu20.04
package with a version that matches your local SDK toolset. However, the version installed might not be the latest available version of the image. For SDK version 1.43, the server installs theopenmpi4.1.0-ubuntu20.04:20220616
package version by default, but this package version isn't compatible with SDK 1.43. Make sure you use the latest SDK for your deployment.
- If you deploy with SDK v1, and don't explicitly specify an image in your deployment configuration, the server applies the
If you can't update the image, you can temporarily avoid the issue by pinning the
azureml-defaults==1.43
orazureml-inference-server-http~=0.4.13
entries in your environment file. These entries direct the server to install the older version withflask 1.0.x
.
ImportError or ModuleNotFoundError during server startup
You might encounter an ImportError
or ModuleNotFoundError
on specific modules during server startup, such as opencensus
, jinja2
, markupsafe
, or click
. Here's an example of the error message:
ImportError: cannot import name 'Markup' from 'jinja2'
The import and module errors occur when you use older versions of the server (version 0.4.10 and earlier) that don't pin the Flask dependency to a compatible version.
To prevent the issue, install a later version of the server.