Debug scoring scripts with Azure Machine Learning inference HTTP server

Article
09/27/2024

The Azure Machine Learning inference HTTP server is a Python package that exposes your scoring function as an HTTP endpoint and wraps the Flask server code and dependencies into a singular package. The server is included in the prebuilt Docker images for inference that are used when deploying a model with Azure Machine Learning. Using the package alone, you can deploy the model locally for production, and easily validate your scoring (entry) script in a local development environment. If there's a problem with the scoring script, the server returns an error and the location of the error.

The server can also be used to create validation gates in a continuous integration and deployment pipeline. For example, you can start the server with the candidate script and run the test suite against the local endpoint.

This article supports developers who want to use the inference server to debug locally and describes how to use the inference server with online endpoints on Windows.

Prerequisites

To use the Azure Machine Learning inference HTTP server for local debugging, your configuration must include the following components:

Python 3.8 or later
Anaconda

The Azure Machine Learning inference HTTP server runs on Windows and Linux based operating systems.

Explore local debugging options for online endpoints

By debugging endpoints locally before you deploy to the cloud, you can catch errors in your code and configuration earlier. To debug endpoints locally, you have several options, including:

The Azure Machine Learning inference HTTP server
A local endpoint

This article describes how to work with the Azure Machine Learning inference HTTP server on Windows.

The following table provides an overview of scenarios to help you choose the best option:

Scenario	Inference HTTP server	Local endpoint
Update local Python environment without Docker image rebuild	Yes	No
Update scoring script	Yes	Yes
Update deployment configurations (deployment, environment, code, model)	No	Yes
Integrate Microsoft Visual Studio Code (VS Code) Debugger	Yes	Yes

When you run the inference HTTP server locally, you can focus on debugging your scoring script without concern for deployment container configurations.

Install azureml-inference-server-http package

To install the azureml-inference-server-http package, run the following command:

python -m pip install azureml-inference-server-http

Note

To avoid package conflicts, install the inference HTTP server in a virtual environment. You can use the pip install virtualenv command to enable virtual environments for your configuration.

Debug your scoring script locally

To debug your scoring script locally, you have several options for testing the server behavior:

Try a dummy scoring script.
Use Visual Studio Code to debug with the azureml-inference-server-http package.
Run an actual scoring script, model file, and environment file from our examples repo.

Test server behavior with dummy scoring script

Create a directory named server_quickstart to hold your files:
```
mkdir server_quickstart
cd server_quickstart
```
To avoid package conflicts, create a virtual environment, such as myenv, and activate it:
```
python -m virtualenv myenv
```
Note

On Linux, run the source myenv/bin/activate command to activate the virtual environment.

After you test the server, you can run the deactivate command to deactivate the Python virtual environment.
Install the azureml-inference-server-http package from the pypi feed:
```
python -m pip install azureml-inference-server-http
```

Create your entry script. The following example creates a basic entry script and saves it to a file named score.py:

echo -e "import time def init(): \n\t time.sleep(1) \n\n def run(input_data): \n\t return {"message":"Hello, World!"}" > score.py

Start the server with the azmlinfsrv command and set the score.py file as the entry script:
```
azmlinfsrv --entry_script score.py
```
Note

The server is hosted on 0.0.0.0, which means it listens to all IP addresses of the hosting machine.
Send a scoring request to the server by using the curl utility:
```
curl -p 127.0.0.1:5001/score
```
The server posts the following response:
```
{"message": "Hello, World!"}
```
After testing, select Ctrl + C to terminate the server.

Now you can modify the scoring script file (score.py) and test your changes by running the server again with the azmlinfsrv --entry_script score.py command.

Integrate with Visual Studio Code

To use VS Code and the Python Extension for debugging with the azureml-inference-server-http package, you can use the Launch and Attach modes.

For Launch mode, set up the launch.json file in VS Code and start the Azure Machine Learning inference HTTP server within VS Code:
1. Start VS Code and open the folder containing the script (score.py).
2. Add the following configuration to the launch.json file for that workspace in VS Code:
  
  launch.json
```
{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Debug score.py",
            "type": "python",
            "request": "launch",
            "module": "azureml_inference_server_http.amlserver",
            "args": [
                "--entry_script",
                "score.py"
            ]
        }
    ]
  }
```
3. Start the debugging session in VS Code by selecting Run > Start Debugging or use the keyboard shortcut F5.
For Attach mode, start the Azure Machine Learning inference HTTP server in a command window, and use VS Code with the Python Extension to attach to the process:

Note

For Linux, first install the gdb package by running the sudo apt-get install -y gdb command.
1. Add the following configuration to the launch.json file for that workspace in VS Code:
  
  launch.json
```
{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Attach using Process Id",
            "type": "python",
            "request": "attach",
            "processId": "${command:pickProcess}",
            "justMyCode": true
        }
    ]
  }
```
2. In a command window, start the inference HTTP server by using the azmlinfsrv --entry_script score.py command.
3. Start the debugging session in VS Code:
  1. Select Run > Start Debugging or use the keyboard shortcut F5.
  2. In the command window, view the logs from the inference server and locate the process ID of the azmlinfsrv command (not the gunicorn):
  3. In the VS Code Debugger, enter the process ID of the azmlinfsrv command.
    
    If you don't see the VS Code process picker, you can manually enter the process ID in the processId field of the launch.json file for that workspace.

For both modes, you can set breakpoints and debug the script step by step.

Use an end-to-end example

The following procedure runs the server locally with sample files (scoring script, model file, and environment) from the Azure Machine Learning example repository. For more examples of how to use these sample files, see Deploy and score a machine learning model by using an online endpoint.

Clone the sample repository:

git clone --depth 1 https://github.com/Azure/azureml-examples
cd azureml-examples/cli/endpoints/online/model-1/

Create and activate a virtual environment with conda:

In this example, the azureml-inference-server-http package is automatically installed. The package is included as a dependent library of the azureml-defaults package in the conda.yml file:
```
# Create the environment from the YAML file
conda env create --name model-env -f ./environment/conda.yml
# Activate the new environment
conda activate model-env
```
Review your scoring script:

onlinescoring/score.py

import os
import logging
import json
import numpy
import joblib


def init():
    """
    This function is called when the container is initialized/started, typically after create/update of the deployment.
    You can write the logic here to perform init operations like caching the model in memory
    """
    global model
    # AZUREML_MODEL_DIR is an environment variable created during deployment.
    # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
    # Please provide your model's folder name if there is one
    model_path = os.path.join(
        os.getenv("AZUREML_MODEL_DIR"), "model/sklearn_regression_model.pkl"
    )
    # deserialize the model file back into a sklearn model
    model = joblib.load(model_path)
    logging.info("Init complete")


def run(raw_data):
    """
    This function is called for every invocation of the endpoint to perform the actual scoring/prediction.
    In the example we extract the data from the json input and call the scikit-learn model's predict()
    method and return the result back
    """
    logging.info("model 1: request received")
    data = json.loads(raw_data)["data"]
    data = numpy.array(data)
    result = model.predict(data)
    logging.info("Request processed")
    return result.tolist()

Run the inference HTTP server by specifying the scoring script and model file: The model directory specified in the model_dir parameter is defined by using the AZUREML_MODEL_DIR variable and retrieved in the scoring script.

In this case, you specify the current directory ./ because the subdirectory is specified in the scoring script as model/sklearn_regression_model.pkl.
```
azmlinfsrv --entry_script ./onlinescoring/score.py --model_dir ./
```
When the server launches and successfully invokes the scoring script, the example startup log opens. Otherwise, the log shows error messages.
Test the scoring script with sample data:

Open another command window and change into the same working directory where you run the command.

Use the curl utility to send an example request to the server and receive a scoring result:
```
curl --request POST "127.0.0.1:5001/score" --header "Content-Type:application/json" --data @sample-request.json
```
When there are no problems in your scoring script, the script returns the scoring result. If problems occur, you can try to update the scoring script, and launch the server again to test the updated script.

Review server routes

The inference HTTP server listens on port 5001 by default at the following routes:

Name	Route
Liveness Probe	`127.0.0.1:5001/`
Score	`127.0.0.1:5001/score`
OpenAPI (swagger)	`127.0.0.1:5001/swagger.json`

Review server parameters

The inference HTTP server accepts the following parameters:

Parameter	Required	Default	Description
`entry_script`	True	N/A	Identifies the relative or absolute path to the scoring script.
`model_dir`	False	N/A	Identifies the relative or absolute path to the directory that holds the model used for inferencing.
`port`	False	5001	Specifies the serving port of the server.
`worker_count`	False	1	Provides the number of worker threads to process concurrent requests.
`appinsights_instrumentation_key`	False	N/A	Provides the instrumentation key to the application insights where the logs are published.
`access_control_allow_origins`	False	N/A	Enables CORS for the specified origins, where multiple origins are separated by a comma (,), such as `microsoft.com, bing.com`.

Explore server request processing

The following steps demonstrate how the Azure Machine Learning inference HTTP server (azmlinfsrv) handles incoming requests:

A Python CLI wrapper sits around the server's network stack and is used to start the server.
A client sends a request to the server.
The server sends the request through the Web Server Gateway Interface (WSGI) server, which dispatches the request to a Flask worker application:
- Windows: waitress
- Linux: gunicorn
The Flask worker app handles the request, which includes loading the entry script and any dependencies.
Your entry script receives the request. The entry script makes an inference call to the loaded model and returns a response:

Explore server logs

There are two ways to obtain log data for the inference HTTP server test:

Run the azureml-inference-server-http package locally and view the logs output.
Use online endpoints and view the container logs. The log for the inference server is named Azure Machine Learning Inferencing HTTP server <version>.

Note

The logging format has changed since version 0.8.0. If your log uses a different style than expected, update the azureml-inference-server-http package to the latest version.

View startup logs

When the server starts, the logs show the initial server settings as follows:

Azure Machine Learning Inferencing HTTP server <version>

Server Settings
---------------
Entry Script Name: <entry_script>
Model Directory: <model_dir>
Worker Count: <worker_count>
Worker Timeout (seconds): None
Server Port: <port>
Application Insights Enabled: false
Application Insights Key: <appinsights_instrumentation_key>
Inferencing HTTP server version: azmlinfsrv/<version>
CORS for the specified origins: <access_control_allow_origins>

Server Routes
---------------
Liveness Probe: GET   127.0.0.1:<port>/
Score:          POST  127.0.0.1:<port>/score

<logs>

For example, when you launch the server by following the end-to-end example, the log displays as follows:

Azure Machine Learning Inferencing HTTP server v0.8.0

Server Settings
---------------
Entry Script Name: /home/user-name/azureml-examples/cli/endpoints/online/model-1/onlinescoring/score.py
Model Directory: ./
Worker Count: 1
Worker Timeout (seconds): None
Server Port: 5001
Application Insights Enabled: false
Application Insights Key: None
Inferencing HTTP server version: azmlinfsrv/0.8.0
CORS for the specified origins: None

Server Routes
---------------
Liveness Probe: GET   127.0.0.1:5001/
Score:          POST  127.0.0.1:5001/score

2022-12-24 07:37:53,318 I [32726] gunicorn.error - Starting gunicorn 20.1.0
2022-12-24 07:37:53,319 I [32726] gunicorn.error - Listening at: http://0.0.0.0:5001 (32726)
2022-12-24 07:37:53,319 I [32726] gunicorn.error - Using worker: sync
2022-12-24 07:37:53,322 I [32756] gunicorn.error - Booting worker with pid: 32756
Initializing logger
2022-12-24 07:37:53,779 I [32756] azmlinfsrv - Starting up app insights client
2022-12-24 07:37:54,518 I [32756] azmlinfsrv.user_script - Found user script at /home/user-name/azureml-examples/cli/endpoints/online/model-1/onlinescoring/score.py
2022-12-24 07:37:54,518 I [32756] azmlinfsrv.user_script - run() is not decorated. Server will invoke it with the input in JSON string.
2022-12-24 07:37:54,518 I [32756] azmlinfsrv.user_script - Invoking user's init function
2022-12-24 07:37:55,974 I [32756] azmlinfsrv.user_script - Users's init has completed successfully
2022-12-24 07:37:55,976 I [32756] azmlinfsrv.swagger - Swaggers are prepared for the following versions: [2, 3, 3.1].
2022-12-24 07:37:55,977 I [32756] azmlinfsrv - AML_FLASK_ONE_COMPATIBILITY is set, but patching is not necessary.

Understand log data format

All logs from the inference HTTP server, except for the launcher script, present data in the following format:

<UTC Time> | <level> [<pid>] <logger name> - <message>

The entry consists of the following components:

<UTC Time>: Time when the entry was entered into the log.
<pid>: ID of the process associated with the entry.
<level>: First character of the logging level for the entry, such as E for ERROR, I for INFO, and so on.
<logger name>: Name of the resource associated with the log entry.
<message>: Contents of the log message.

There are six levels of logging in Python with assigned numeric values according to severity:

Logging level	Numeric value
CRITICAL	50
ERROR	40
WARNING	30
INFO	20
DEBUG	10
NOTSET	0

Troubleshoot server issues

The following sections provide basic troubleshooting tips for Azure Machine Learning inference HTTP server. To troubleshoot online endpoints, see Troubleshoot online endpoints deployment.

Check installed packages

Follow these steps to address issues with installed packages.

Gather information about installed packages and versions for your Python environment.
Confirm that the azureml-inference-server-http Python package version specified in the environment file matches the Azure Machine Learning inference HTTP server version displayed in the startup log.

In some cases, the pip dependency resolver installs unexpected package versions. You might need to run pip to correct installed packages and versions.
If you specify the Flask or its dependencies in your environment, remove these items.
- Dependent packages include flask, jinja2, itsdangerous, werkzeug, markupsafe, and click.
- flask is listed as a dependency in the server package. The best approach is to allow the inference server to install the flask package.
- When the inference server is configured to support new versions of Flask, the server automatically receives the package updates as they become available.

Check server version

The azureml-inference-server-http server package is published to PyPI. The PyPI page lists the changelog and all previous versions.

If you're using an earlier package version, update your configuration to the latest version. The following table summarizes stable versions, common issues, and recommended adjustments:

Package version	Description	Issue	Resolution
0.4.x	Bundled in training images dated `20220601` or earlier and `azureml-defaults` package versions `.1.34` through `1.43`. Latest stable version is 0.4.13.	For server versions earlier than 0.4.11, you might encounter Flask dependency issues, such as `"can't import name Markup from jinja2"`.	Upgrade to version 0.4.13 or 0.8.x, the latest version, if possible.
0.6.x	Preinstalled in inferencing images dated `20220516` and earlier. Latest stable version is 0.6.1.	N/A	N/A
0.7.x	Supports Flask 2. Latest stable version is 0.7.7.	N/A	N/A
0.8.x	Log format changed. Python 3.6 support ended.	N/A	N/A

Check package dependencies

The most relevant dependent packages for the azureml-inference-server-http server package include:

flask
opencensus-ext-azure
inference-schema

If you specified the azureml-defaults package in your Python environment, the azureml-inference-server-http package is a dependent package. The dependency is installed automatically.

Tip

If you use Python SDK v1 and don't explicitly specify the azureml-defaults package in your Python environment, the SDK might automatically add the package. However, the packager version is locked relative to the SDK version. For example, if the SDK version is 1.38.0, then the azureml-defaults==1.38.0 entry is added to the environment's pip requirements.

TypeError during server startup

You might encounter the following TypeError during server startup:

TypeError: register() takes 3 positional arguments but 4 were given

  File "/var/azureml-server/aml_blueprint.py", line 251, in register

    super(AMLBlueprint, self).register(app, options, first_registration)

TypeError: register() takes 3 positional arguments but 4 were given

This error occurs when you have Flask 2 installed in your Python environment, but your azureml-inference-server-http package version doesn't support Flask 2. Support for Flask 2 is available in azureml-inference-server-http package version 0.7.0 and later, and azureml-defaults package version 1.44 and later.

If you don't use the Flask 2 package in an Azure Machine Learning Docker image, use the latest version of the azureml-inference-server-http or azureml-defaults package.
If you use the Flask 2 package in an Azure Machine Learning Docker image, confirm that the image build version is July 2022 or later.

You can find the image version in the container logs. For example:
```
2022-08-22T17:05:02,147738763+00:00 | gunicorn/run | AzureML Container Runtime Information
2022-08-22T17:05:02,161963207+00:00 | gunicorn/run | ###############################################
2022-08-22T17:05:02,168970479+00:00 | gunicorn/run | 
2022-08-22T17:05:02,174364834+00:00 | gunicorn/run | 
2022-08-22T17:05:02,187280665+00:00 | gunicorn/run | AzureML image information: openmpi4.1.0-ubuntu20.04, Materialization Build:20220708.v2
2022-08-22T17:05:02,188930082+00:00 | gunicorn/run | 
2022-08-22T17:05:02,190557998+00:00 | gunicorn/run | 
```
The build date of the image appears after the Materialization Build notation. In the preceding example, the image version is 20220708 or July 8, 2022. The image in this example is compatible with Flask 2.

If you don't see a similar message in your container log, your image is out-of-date and should be updated. If you use a Compute Unified Device Architecture (CUDA) image, and you can't find a newer image, check if your image is deprecated in AzureML-Containers. You can find designated replacements for deprecated images.

If you use the server with an online endpoint, you can also find the logs in the Logs on the Endpoints page in Azure Machine Learning studio.

If you deploy with SDK v1, and don't explicitly specify an image in your deployment configuration, the server applies the openmpi4.1.0-ubuntu20.04 package with a version that matches your local SDK toolset. However, the version installed might not be the latest available version of the image.

For SDK version 1.43, the server installs the openmpi4.1.0-ubuntu20.04:20220616 package version by default, but this package version isn't compatible with SDK 1.43. Make sure you use the latest SDK for your deployment.

If you can't update the image, you can temporarily avoid the issue by pinning the azureml-defaults==1.43 or azureml-inference-server-http~=0.4.13 entries in your environment file. These entries direct the server to install the older version with flask 1.0.x.

ImportError or ModuleNotFoundError during server startup

You might encounter an ImportError or ModuleNotFoundError on specific modules, such as opencensus, jinja2, markupsafe, or click, during server startup. The following example shows the error message:

ImportError: cannot import name 'Markup' from 'jinja2'

The import and module errors occur when you use version 0.4.10 or earlier versions of the server that don't pin the Flask dependency to a compatible version. To prevent the issue, install a later version of the server.