Progressive rollout of MLflow models to Online Endpoints
Article
In this article, you learn how you can progressively update and deploy MLflow models to Online Endpoints without causing service disruption. You use blue-green deployment, also known as a safe rollout strategy, to introduce a new version of a web service to production. This strategy will allow you to roll out your new version of the web service to a small subset of users or requests before rolling it out completely.
About this example
Online Endpoints have the concept of Endpoint and Deployment. An endpoint represents the API that customers use to consume the model, while the deployment indicates the specific implementation of that API. This distinction allows users to decouple the API from the implementation and to change the underlying implementation without affecting the consumer. This example will use such concepts to update the deployed model in endpoints without introducing service disruption.
The model we will deploy is based on the UCI Heart Disease Data Set. The database contains 76 attributes, but we are using a subset of 14 of them. The model tries to predict the presence of heart disease in a patient. It is integer valued from 0 (no presence) to 1 (presence). It has been trained using an XGBBoost classifier and all the required preprocessing has been packaged as a scikit-learn pipeline, making this model an end-to-end pipeline that goes from raw data to predictions.
The information in this article is based on code samples contained in the azureml-examples repository. To run the commands locally without having to copy/paste files, clone the repo, and then change directories to sdk/using-mlflow/deploy.
Before following the steps in this article, make sure you have the following prerequisites:
An Azure subscription. If you don't have an Azure subscription, create a trial subscription before you begin. Try the trial subscription.
Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the owner or contributor role for the Azure Machine Learning workspace, or a custom role allowing Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*. For more information, see Manage access to an Azure Machine Learning workspace.
Install the Mlflow SDK package mlflow and the Azure Machine Learning plug-in for MLflow azureml-mlflow.
pip install mlflow azureml-mlflow
If you are not running in Azure Machine Learning compute, configure the MLflow tracking URI or MLflow's registry URI to point to the workspace you are working on. Learn how to configure MLflow for Azure Machine Learning.
Connect to your workspace
First, let's connect to Azure Machine Learning workspace where we are going to work on.
az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>
The workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section, we'll connect to the workspace in which you'll perform deployment tasks.
Import the required libraries:
from azure.ai.ml import MLClient, Input
from azure.ai.ml.entities import ManagedOnlineEndpoint, ManagedOnlineDeployment, Model
from azure.ai.ml.constants import AssetTypes
from azure.identity import DefaultAzureCredential
Configure workspace details and get a handle to the workspace:
Ensure your model is registered in Azure Machine Learning registry. Deployment of unregistered models is not supported in Azure Machine Learning. You can register a new model using the MLflow SDK:
Online endpoints are endpoints that are used for online (real-time) inferencing. Online endpoints contain deployments that are ready to receive data from clients and can send responses back in real time.
We are going to exploit this functionality by deploying multiple versions of the same model under the same endpoint. However, the new deployment will receive 0% of the traffic at the begging. Once we are sure about the new model to work correctly, we are going to progressively move traffic from one deployment to the other.
Endpoints require a name, which needs to be unique in the same region. Let's ensure to create one that doesn't exist:
import random
import string
# Creating a unique endpoint name by including a random suffix
allowed_chars = string.ascii_lowercase + string.digits
endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
endpoint_name = "heart-classifier-" + endpoint_suffix
print(f"Endpoint name: {endpoint_name}")
import random
import string
# Creating a unique endpoint name by including a random suffix
allowed_chars = string.ascii_lowercase + string.digits
endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
endpoint_name = "heart-classifier-" + endpoint_suffix
print(f"Endpoint name: {endpoint_name}")
endpoint = ManagedOnlineEndpoint(
name=endpoint_name,
description="An endpoint to serve predictions of the UCI heart disease problem",
auth_mode="key",
)
We can configure the properties of this endpoint using a configuration file. We configure the authentication mode of the endpoint to be "key" in the following example:
This functionality is not available in the MLflow SDK. Go to Azure Machine Learning studio, navigate to the endpoint, and retrieve the secret key from there.
Create a blue deployment
So far, the endpoint is empty. There are no deployments on it. Let's create the first one by deploying the same model we were working on before. We will call this deployment "default", representing our "blue deployment".
The following code samples 5 observations from the training dataset, removes the target column (as the model will predict it), and creates a request in the file sample.json that can be used with the model deployment.
samples = (
pd.read_csv("data/heart.csv")
.sample(n=5)
.drop(columns=["target"])
.reset_index(drop=True)
)
with open("sample.json", "w") as f:
f.write(
json.dumps(
{"input_data": json.loads(samples.to_json(orient="split", index=False))}
)
)
The following code samples 5 observations from the training dataset, removes the target column (as the model will predict it), and creates a request.
Let's imagine that there is a new version of the model created by the development team and it is ready to be in production. We can first try to fly this model and once we are confident, we can update the endpoint to route the traffic to it.
We are using the same hardware confirmation indicated in the deployment-config-file. However, there is no requirements to have the same configuration. You can configure different hardware for different models depending on the requirements.
Write the configuration to a file:
deployment_config_path = "deployment_config.json"
with open(deployment_config_path, "w") as outfile:
outfile.write(json.dumps(deploy_config))
Notice how now we are indicating the name of the deployment we want to invoke.
Progressively update the traffic
One we are confident with the new deployment, we can update the traffic to route some of it to the new deployment. Traffic is configured at the endpoint level: