Progressive rollout of MLflow models to Online Endpoints

In this article, you learn how you can progressively update and deploy MLflow models to Online Endpoints without causing service disruption. You use blue-green deployment, also known as a safe rollout strategy, to introduce a new version of a web service to production. This strategy allows you to roll out your new version of the web service to a small subset of users or requests before rolling it out completely.

About this example

Online Endpoints have the concept of Endpoint and Deployment. An endpoint represents the API that customers use to consume the model, while the deployment indicates the specific implementation of that API. This distinction allows users to decouple the API from the implementation and to change the underlying implementation without affecting the consumer. This example uses such concepts to update the deployed model in endpoints without introducing service disruption.

The model you deploy is based on the UCI Heart Disease Data Set. The database contains 76 attributes, but this example uses a subset of 14 of them. The model tries to predict the presence of heart disease in a patient. It's integer valued from 0 (no presence) to 1 (presence). It has been trained using an XGBoost classifier and all the required preprocessing has been packaged as a scikit-learn pipeline, making this model an end-to-end pipeline that goes from raw data to predictions.

The information in this article is based on code samples contained in the azureml-examples repository. To run the commands locally without having to copy/paste files, clone the repo, and then change directories to sdk/using-mlflow/deploy.

Follow along in Jupyter Notebooks

You can follow along this sample in the following notebooks. In the cloned repository, open the notebook: mlflow_sdk_online_endpoints_progresive.ipynb.

Prerequisites

Before following the steps in this article, make sure you have the following prerequisites:

An Azure subscription. If you don't have an Azure subscription, create a trial subscription before you begin. Try the trial subscription.
Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the owner or contributor role for the Azure Machine Learning workspace, or a custom role allowing Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*. For more information, see Manage access to an Azure Machine Learning workspace.

Additionally, you need to:

Install the Azure CLI and the ml extension to the Azure CLI. For more information, see Install, set up, and use the CLI (v2).

Install the Mlflow SDK package mlflow and the Azure Machine Learning plug-in for MLflow azureml-mlflow.
```
pip install mlflow azureml-mlflow
```
If you aren't running in Azure Machine Learning compute, configure the MLflow tracking URI or MLflow's registry URI to point to the workspace you're working on. Learn how to configure MLflow for Azure Machine Learning.

Connect to your workspace

First, connect to the Azure Machine Learning workspace where you perform deployment tasks.

az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>

The workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section, you connect to the workspace in which you perform deployment tasks.

Import the required libraries:

from azure.ai.ml import MLClient, Input
from azure.ai.ml.entities import ManagedOnlineEndpoint, ManagedOnlineDeployment, Model
from azure.ai.ml.constants import AssetTypes
from azure.identity import DefaultAzureCredential

Configure workspace details and get a handle to the workspace:

subscription_id = "<subscription>"
resource_group = "<resource-group>"
workspace = "<workspace>"

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)

Import the required libraries

import json
import mlflow
import requests
import pandas as pd
from mlflow.deployments import get_deploy_client

Configure the MLflow client and the deployment client:

mlflow_client = mlflow.MLflowClient()
deployment_client = get_deploy_client(mlflow.get_tracking_uri())

Registering the model in the registry

Ensure your model is registered in the Azure Machine Learning registry. Deployment of unregistered models isn't supported in Azure Machine Learning. You can register a new model using the MLflow SDK:

MODEL_NAME='heart-classifier'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path "model"

model_name = 'heart-classifier'
model_local_path = "model"

model = ml_client.models.create_or_update(
     Model(name=model_name, path=model_local_path, type=AssetTypes.MLFLOW_MODEL)
)

model_name = 'heart-classifier'
model_local_path = "model"

registered_model = mlflow_client.create_model_version(
    name=model_name, source=f"file://{model_local_path}"
)
version = registered_model.version

Create an online endpoint

Online endpoints are endpoints that are used for online (real-time) inferencing. Online endpoints contain deployments that are ready to receive data from clients and can send responses back in real time.

You can exploit this functionality by deploying multiple versions of the same model under the same endpoint. The new deployment receives 0% of the traffic at the beginning. Once you're sure the new model works correctly, you progressively move traffic from one deployment to the other.

Endpoints require a name, which needs to be unique in the same region. Let's ensure to create one that doesn't exist:

ENDPOINT_SUFIX=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w ${1:-5} | head -n 1)
ENDPOINT_NAME="heart-classifier-$ENDPOINT_SUFIX"

import random
import string

# Creating a unique endpoint name by including a random suffix
allowed_chars = string.ascii_lowercase + string.digits
endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
endpoint_name = "heart-classifier-" + endpoint_suffix

print(f"Endpoint name: {endpoint_name}")

import random
import string

# Creating a unique endpoint name by including a random suffix
allowed_chars = string.ascii_lowercase + string.digits
endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
endpoint_name = "heart-classifier-" + endpoint_suffix

print(f"Endpoint name: {endpoint_name}")

Configure the endpoint

Tip

This example uses key-based authentication for simplicity and is suitable for development and testing scenarios only. For production deployments, use Microsoft Entra token-based authentication (aad_token), which provides enhanced security through identity-based access control. For more information, see Authenticate clients for online endpoints.
endpoint.yml
```
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: heart-classifier-edp
auth_mode: key
```
```
endpoint = ManagedOnlineEndpoint(
    name=endpoint_name,
    description="An endpoint to serve predictions of the UCI heart disease problem",
    auth_mode="key",
)
```
You can configure the properties of this endpoint using a configuration file. The following example configures the authentication mode of the endpoint to be "key":
```
endpoint_config = {
    "auth_mode": "key",
    "identity": {
        "type": "system_assigned"
    }
}
```
Write this configuration into a JSON file:
```
endpoint_config_path = "endpoint_config.json"
with open(endpoint_config_path, "w") as outfile:
    outfile.write(json.dumps(endpoint_config))
```

Create the endpoint:

az ml online-endpoint create -n $ENDPOINT_NAME -f endpoint.yml

ml_client.online_endpoints.begin_create_or_update(endpoint).result()

endpoint = deployment_client.create_endpoint(
    name=endpoint_name,
    config={"endpoint-config-file": endpoint_config_path},
)

Getting the authentication secret for the endpoint.
```
ENDPOINT_SECRET_KEY=$(az ml online-endpoint get-credentials -n $ENDPOINT_NAME -o tsv --query primaryKey)
```
```
endpoint_secret_key = ml_client.online_endpoints.get_keys(
    name=endpoint_name
).primary_key
```
This functionality isn't available in the MLflow SDK. Go to Azure Machine Learning studio, navigate to the endpoint, and retrieve the secret key from there.

Create a blue deployment

So far, the endpoint is empty. There are no deployments on it. Create the first one by deploying the same model you registered earlier. This deployment is called "default", representing the "blue deployment".

Configure the deployment

blue-deployment.yml

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: default
endpoint_name: heart-classifier-edp
model: azureml:heart-classifier@latest
instance_type: Standard_DS2_v2
instance_count: 1

blue_deployment_name = "default"

Configure the hardware requirements of your deployment:

blue_deployment = ManagedOnlineDeployment(
    name=blue_deployment_name,
    endpoint_name=endpoint_name,
    model=model,
    instance_type="Standard_DS2_v2",
    instance_count=1,
)

blue_deployment_name = "default"

To configure the hardware requirements of your deployment, you need to create a JSON file with the desired configuration:

deploy_config = {
    "instance_type": "Standard_DS2_v2",
    "instance_count": 1,
}

Note

The full specification of this configuration can be found at Managed online deployment schema (v2).

Write the configuration to a file:

deployment_config_path = "deployment_config.json"
with open(deployment_config_path, "w") as outfile:
    outfile.write(json.dumps(deploy_config))

Create the deployment

az ml online-deployment create --endpoint-name $ENDPOINT_NAME -f blue-deployment.yml --all-traffic

If your endpoint doesn't have egress connectivity, use model packaging (preview) by including the flag --with-package:

az ml online-deployment create --with-package --endpoint-name $ENDPOINT_NAME -f blue-deployment.yml --all-traffic

Tip

The flag --all-traffic in the create command assigns all the traffic to the new deployment.

ml_client.online_deployments.begin_create_or_update(blue_deployment).result()

blue_deployment = deployment_client.create_deployment(
    name=blue_deployment_name,
    endpoint=endpoint_name,
    model_uri=f"models:/{model_name}/{version}",
    config={"deploy-config-file": deployment_config_path},
)

Assign all the traffic to the deployment

So far, the endpoint has one deployment, but none of its traffic is assigned to it. Assign it now.
This step in not required in the Azure CLI since we used the --all-traffic during creation.
```
endpoint.traffic = { blue_deployment_name: 100 }
```
```
traffic_config = {"traffic": {blue_deployment_name: 100}}
```
Write the configuration to a file:
```
traffic_config_path = "traffic_config.json"
with open(traffic_config_path, "w") as outfile:
    outfile.write(json.dumps(traffic_config))
```
Update the endpoint configuration:
This step in not required in the Azure CLI since we used the --all-traffic during creation.
```
ml_client.begin_create_or_update(endpoint).result()
```
```
deployment_client.update_endpoint(
    endpoint=endpoint_name,
    config={"endpoint-config-file": traffic_config_path},
)
```

Create a sample input to test the deployment

sample.yml

{
    "input_data": {
        "columns": [
            "age",
            "sex",
            "cp",
            "trestbps",
            "chol",
            "fbs",
            "restecg",
            "thalach",
            "exang",
            "oldpeak",
            "slope",
            "ca",
            "thal"
        ],
        "data": [
            [ 48, 0, 3, 130, 275, 0, 0, 139, 0, 0.2, 1, 0, "normal" ]
        ]
    }
}

The following code samples 5 observations from the training dataset, removes the target column (as the model predicts it), and creates a request in the file sample.json that can be used with the model deployment.

samples = (
    pd.read_csv("data/heart.csv")
    .sample(n=5)
    .drop(columns=["target"])
    .reset_index(drop=True)
)

with open("sample.json", "w") as f:
    f.write(
        json.dumps(
            {"input_data": json.loads(samples.to_json(orient="split", index=False))}
        )
    )

The following code samples 5 observations from the training dataset, removes the target column (as the model predicts it), and creates a request.

samples = (
    pd.read_csv("data/heart.csv")
    .sample(n=5)
    .drop(columns=["target"])
    .reset_index(drop=True)
)

Test the deployment

az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file sample.json

ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    request_file="sample.json",
)

deployment_client.predict(
    endpoint=endpoint_name, 
    df=samples
)

Create a green deployment under the endpoint

Imagine a new version of the model is created by the development team and is ready for production. You can first test this model and once you're confident, update the endpoint to route the traffic to it.

MODEL_NAME='heart-classifier'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path "model"

Get the version number of the new model:

VERSION=$(az ml model show -n heart-classifier --label latest | jq -r ".version")

model_name = 'heart-classifier'
model_local_path = "model"

model = ml_client.models.create_or_update(
     Model(name=model_name, path=model_local_path, type=AssetTypes.MLFLOW_MODEL)
)
version = model.version

model_name = 'heart-classifier'
model_local_path = "model"

registered_model = mlflow_client.create_model_version(
    name=model_name, source=f"file://{model_local_path}"
)
version = registered_model.version

Configure a new deployment

green-deployment.yml

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: xgboost-model
endpoint_name: heart-classifier-edp
model: azureml:heart-classifier@latest
instance_type: Standard_DS2_v2
instance_count: 1

Name the deployment as follows:

GREEN_DEPLOYMENT_NAME="xgboost-model-$VERSION"

green_deployment_name = f"xgboost-model-{version}"

Configure the hardware requirements of your deployment:

green_deployment = ManagedOnlineDeployment(
    name=green_deployment_name,
    endpoint_name=endpoint_name,
    model=model,
    instance_type="Standard_DS2_v2",
    instance_count=1,
)

If your endpoint doesn't have egress connectivity, use model packaging (preview) by including the argument with_package=True:

green_deployment = ManagedOnlineDeployment(
    name=green_deployment_name,
    endpoint_name=endpoint_name,
    model=model,
    instance_type="Standard_DS2_v2",
    instance_count=1,
    with_package=True,
)

green_deployment_name = f"xgboost-model-{version}"

To configure the hardware requirements of your deployment, you need to create a JSON file with the desired configuration:

deploy_config = {
    "instance_type": "Standard_DS2_v2",
    "instance_count": 1,
}

Tip

This example uses the same hardware configuration indicated in the deployment-config-file. However, there's no requirement to have the same configuration. You can configure different hardware for different models depending on the requirements.

Write the configuration to a file:

deployment_config_path = "deployment_config.json"
with open(deployment_config_path, "w") as outfile:
    outfile.write(json.dumps(deploy_config))

Create the new deployment

az ml online-deployment create -n $GREEN_DEPLOYMENT_NAME --endpoint-name $ENDPOINT_NAME -f green-deployment.yml

If your endpoint doesn't have egress connectivity, use model packaging (preview) by including the flag --with-package:

az ml online-deployment create --with-package -n $GREEN_DEPLOYMENT_NAME --endpoint-name $ENDPOINT_NAME -f green-deployment.yml

ml_client.online_deployments.begin_create_or_update(green_deployment).result()

new_deployment = deployment_client.create_deployment(
    name=green_deployment_name,
    endpoint=endpoint_name,
    model_uri=f"models:/{model_name}/{version}",
    config={"deploy-config-file": deployment_config_path},
)

Test the deployment without changing traffic

az ml online-endpoint invoke --name $ENDPOINT_NAME --deployment-name $GREEN_DEPLOYMENT_NAME --request-file sample.json

ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    deployment_name=green_deployment_name
    request_file="sample.json",
)

deployment_client.predict(
    endpoint=endpoint_name, 
    deployment_name=green_deployment_name, 
    df=samples
)

Tip

Notice how the name of the deployment to invoke is now specified.

Progressively update the traffic

Once you're confident with the new deployment, you can update the traffic to route some of it to the new deployment. Traffic is configured at the endpoint level:

Configure the traffic:

This step in not required in the Azure CLI

endpoint.traffic = {blue_deployment_name: 90, green_deployment_name: 10}

traffic_config = {"traffic": {blue_deployment_name: 90, green_deployment_name: 10}}

Write the configuration to a file:

traffic_config_path = "traffic_config.json"
with open(traffic_config_path, "w") as outfile:
    outfile.write(json.dumps(traffic_config))

Update the endpoint

az ml online-endpoint update --name $ENDPOINT_NAME --traffic "default=90 $GREEN_DEPLOYMENT_NAME=10"

ml_client.begin_create_or_update(endpoint).result()

deployment_client.update_endpoint(
    endpoint=endpoint_name,
    config={"endpoint-config-file": traffic_config_path},
)

If you decide to switch the entire traffic to the new deployment, update all the traffic:

This step in not required in the Azure CLI

endpoint.traffic = {blue_deployment_name: 0, green_deployment_name: 100}

traffic_config = {"traffic": {blue_deployment_name: 0, green_deployment_name: 100}}

Write the configuration to a file:

traffic_config_path = "traffic_config.json"
with open(traffic_config_path, "w") as outfile:
    outfile.write(json.dumps(traffic_config))

Update the endpoint

az ml online-endpoint update --name $ENDPOINT_NAME --traffic "default=0 $GREEN_DEPLOYMENT_NAME=100"

ml_client.begin_create_or_update(endpoint).result()

deployment_client.update_endpoint(
    endpoint=endpoint_name,
    config={"endpoint-config-file": traffic_config_path},
)

Since the old deployment doesn't receive any traffic, you can safely delete it:

az ml online-deployment delete --endpoint-name $ENDPOINT_NAME --name default

ml_client.online_deployments.begin_delete(
    name=blue_deployment_name, 
    endpoint_name=endpoint_name
)

deployment_client.delete_deployment(
    blue_deployment_name, 
    endpoint=endpoint_name
)

Tip

At this point, the former "blue deployment" has been deleted and the new "green deployment" has taken the place of the "blue deployment".

Clean up resources

az ml online-endpoint delete --name $ENDPOINT_NAME --yes

ml_client.online_endpoints.begin_delete(name=endpoint_name)

deployment_client.delete_endpoint(endpoint_name)

Important

Deleting an endpoint also deletes all the deployments under it.

Next steps

Last updated on 2026-04-10

Progressive rollout of MLflow models to Online Endpoints

About this example

Follow along in Jupyter Notebooks

Prerequisites

Connect to your workspace

Registering the model in the registry

Create an online endpoint

Create a blue deployment

Create a green deployment under the endpoint

Progressively update the traffic

Clean up resources

Next steps

Additional resources