In this article, you'll learn how to deploy a new version of a machine learning model in production without causing any disruption. You'll use a blue-green deployment strategy (also known as a safe rollout strategy) to introduce a new version of a web service to production. This strategy will allow you to roll out your new version of the web service to a small subset of users or requests before rolling it out completely.
This article assumes you're using online endpoints, that is, endpoints that are used for online (real-time) inferencing. There are two types of online endpoints: managed online endpoints and Kubernetes online endpoints. For more information on endpoints and the differences between managed online endpoints and Kubernetes online endpoints, see What are Azure Machine Learning endpoints?.
The main example in this article uses managed online endpoints for deployment. To use Kubernetes endpoints instead, see the notes in this document that are inline with the managed online endpoint discussion.
In this article, you'll learn to:
Define an online endpoint with a deployment called "blue" to serve version 1 of a model
Scale the blue deployment so that it can handle more requests
Deploy version 2 of the model (called the "green" deployment) to the endpoint, but send the deployment no live traffic
Test the green deployment in isolation
Mirror a percentage of live traffic to the green deployment to validate it
Send a small percentage of live traffic to the green deployment
Send over all live traffic to the green deployment
The CLI examples in this article assume that you are using the Bash (or compatible) shell. For example, from a Linux system or Windows Subsystem for Linux.
Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the owner or contributor role for the Azure Machine Learning workspace, or a custom role allowing Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*. For more information, see Manage access to an Azure Machine Learning workspace.
(Optional) To deploy locally, you must install Docker Engine on your local computer. We highly recommend this option, so it's easier to debug issues.
Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the owner or contributor role for the Azure Machine Learning workspace, or a custom role allowing Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*. For more information, see Manage access to an Azure Machine Learning workspace.
(Optional) To deploy locally, you must install Docker Engine on your local computer. We highly recommend this option, so it's easier to debug issues.
Before following the steps in this article, make sure you have the following prerequisites:
An Azure subscription. If you don't have an Azure subscription, create a Trial before you begin. Try the trial subscription.
An Azure Machine Learning workspace and a compute instance. If you don't have these, use the steps in the Quickstart: Create workspace resources article to create them.
Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the owner or contributor role for the Azure Machine Learning workspace, or a custom role allowing Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*. For more information, see Manage access to an Azure Machine Learning workspace.
If you haven't already set the defaults for the Azure CLI, save your default settings. To avoid passing in the values for your subscription, workspace, and resource group multiple times, run this code:
az account set --subscription <subscription id>
az configure --defaults workspace=<Azure Machine Learning workspace name> group=<resource group>
git clone --depth 1 https://github.com/Azure/azureml-examples
cd azureml-examples
cd cli
Tip
Use --depth 1 to clone only the latest commit to the repository. This reduces the time to complete the operation.
The commands in this tutorial are in the file deploy-safe-rollout-online-endpoints.sh in the cli directory, and the YAML configuration files are in the endpoints/online/managed/sample/ subdirectory.
Note
The YAML configuration files for Kubernetes online endpoints are in the endpoints/online/kubernetes/ subdirectory.
Clone the examples repository
To run the training examples, first clone the examples repository (azureml-examples). Then, go into the azureml-examples/sdk/python/endpoints/online/managed directory:
git clone --depth 1 https://github.com/Azure/azureml-examples
cd azureml-examples/sdk/python/endpoints/online/managed
Tip
Use --depth 1 to clone only the latest commit to the repository. This reduces the time to complete the operation.
The information in this article is based on the online-endpoints-safe-rollout.ipynb notebook. It contains the same content as this article, although the order of the codes is slightly different.
The workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section, we'll connect to the workspace where you'll perform deployment tasks.
Import the required libraries:
# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
ManagedOnlineEndpoint,
ManagedOnlineDeployment,
Model,
Environment,
CodeConfiguration,
)
from azure.identity import DefaultAzureCredential
Note
If you're using the Kubernetes online endpoint, import the KubernetesOnlineEndpoint and KubernetesOnlineDeployment class from the azure.ai.ml.entities library.
Configure workspace details and get a handle to the workspace:
To connect to a workspace, we need identifier parameters—a subscription, resource group and workspace name. We'll use these details in the MLClient from azure.ai.ml to get a handle to the required Azure Machine Learning workspace. This example uses the default Azure authentication.
# enter details of your AML workspace
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace = "<AML_WORKSPACE_NAME>"
# get a handle to the workspace
ml_client = MLClient(
DefaultAzureCredential(), subscription_id, resource_group, workspace
)
If you have Git installed on your local machine, you can follow the instructions to clone the examples repository. Otherwise, follow the instructions to download files from the examples repository.
Clone the examples repository
To follow along with this article, first clone the examples repository (azureml-examples) and then change into the azureml-examples/cli/endpoints/online/model-1 directory.
git clone --depth 1 https://github.com/Azure/azureml-examples
cd azureml-examples/cli/endpoints/online/model-1
Tip
Use --depth 1 to clone only the latest commit to the repository, which reduces time to complete the operation.
Download files from the examples repository
If you cloned the examples repo, your local machine already has copies of the files for this example, and you can skip to the next section. If you didn't clone the repo, you can download it to your local machine.
Go to the <> Code button on the page, and then select Download ZIP from the Local tab.
Locate the model folder /cli/endpoints/online/model-1/model and scoring script /cli/endpoints/online/model-1/onlinescoring/score.py for a first model model-1.
Locate the model folder /cli/endpoints/online/model-2/model and scoring script /cli/endpoints/online/model-2/onlinescoring/score.py for a second model model-2.
Define the endpoint and deployment
Online endpoints are used for online (real-time) inferencing. Online endpoints contain deployments that are ready to receive data from clients and send responses back in real time.
Define an endpoint
The following table lists key attributes to specify when you define an endpoint.
Attribute
Description
Name
Required. Name of the endpoint. It must be unique in the Azure region. For more information on the naming rules, see endpoint limits.
Authentication mode
The authentication method for the endpoint. Choose between key-based authentication key and Azure Machine Learning token-based authentication aml_token. A key doesn't expire, but a token does expire. For more information on authenticating, see Authenticate to an online endpoint.
Description
Description of the endpoint.
Tags
Dictionary of tags for the endpoint.
Traffic
Rules on how to route traffic across deployments. Represent the traffic as a dictionary of key-value pairs, where key represents the deployment name and value represents the percentage of traffic to that deployment. You can set the traffic only when the deployments under an endpoint have been created. You can also update the traffic for an online endpoint after the deployments have been created. For more information on how to use mirrored traffic, see Allocate a small percentage of live traffic to the new deployment.
A deployment is a set of resources required for hosting the model that does the actual inferencing. The following table describes key attributes to specify when you define a deployment.
Attribute
Description
Name
Required. Name of the deployment.
Endpoint name
Required. Name of the endpoint to create the deployment under.
Model
The model to use for the deployment. This value can be either a reference to an existing versioned model in the workspace or an inline model specification. In the example, we have a scikit-learn model that does regression.
Code path
The path to the directory on the local development environment that contains all the Python source code for scoring the model. You can use nested directories and packages.
Scoring script
Python code that executes the model on a given input request. This value can be the relative path to the scoring file in the source code directory. The scoring script receives data submitted to a deployed web service and passes it to the model. The script then executes the model and returns its response to the client. The scoring script is specific to your model and must understand the data that the model expects as input and returns as output. In this example, we have a score.py file. This Python code must have an init() function and a run() function. The init() function will be called after the model is created or updated (you can use it to cache the model in memory, for example). The run() function is called at every invocation of the endpoint to do the actual scoring and prediction.
Environment
Required. The environment to host the model and code. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification. The environment can be a Docker image with Conda dependencies, a Dockerfile, or a registered environment.
Required. The number of instances to use for the deployment. Base the value on the workload you expect. For high availability, we recommend that you set the value to at least 3. We reserve an extra 20% for performing upgrades. For more information, see limits for online endpoints.
First set the endpoint's name and then configure it. In this article, you'll use the endpoints/online/managed/sample/endpoint.yml file to configure the endpoint. The following snippet shows the contents of the file:
The reference for the endpoint YAML format is described in the following table. To learn how to specify these attributes, see the online endpoint YAML reference. For information about limits related to managed online endpoints, see limits for online endpoints.
Key
Description
$schema
(Optional) The YAML schema. To see all available options in the YAML file, you can view the schema in the preceding code snippet in a browser.
name
The name of the endpoint.
auth_mode
Use key for key-based authentication. Use aml_token for Azure Machine Learning token-based authentication. To get the most recent token, use the az ml online-endpoint get-credentials command.
To create an online endpoint:
Set your endpoint name:
For Unix, run this command (replace YOUR_ENDPOINT_NAME with a unique name):
export ENDPOINT_NAME="<YOUR_ENDPOINT_NAME>"
Important
Endpoint names must be unique within an Azure region. For example, in the Azure chinanorth2 region, there can be only one endpoint with the name my-endpoint.
Create the endpoint in the cloud:
Run the following code to use the endpoint.yml file to configure the endpoint:
az ml online-endpoint create --name $ENDPOINT_NAME -f endpoints/online/managed/sample/endpoint.yml
Create the 'blue' deployment
In this article, you'll use the endpoints/online/managed/sample/blue-deployment.yml file to configure the key aspects of the deployment. The following snippet shows the contents of the file:
To create a deployment named blue for your endpoint, run the following command to use the blue-deployment.yml file to configure
az ml online-deployment create --name blue --endpoint-name $ENDPOINT_NAME -f endpoints/online/managed/sample/blue-deployment.yml --all-traffic
Important
The --all-traffic flag in the az ml online-deployment create allocates 100% of the endpoint traffic to the newly created blue deployment.
In the blue-deployment.yaml file, we specify the path (where to upload files from) inline. The CLI automatically uploads the files and registers the model and environment. As a best practice for production, you should register the model and environment and specify the registered name and version separately in the YAML. Use the form model: azureml:my-model:1 or environment: azureml:my-env:1.
For registration, you can extract the YAML definitions of model and environment into separate YAML files and use the commands az ml model create and az ml environment create. To learn more about these commands, run az ml model create -h and az ml environment create -h.
To create a managed online endpoint, use the ManagedOnlineEndpoint class. This class allows users to configure the key aspects of the endpoint.
Configure the endpoint:
# Creating a unique endpoint name with current datetime to avoid conflicts
import datetime
online_endpoint_name = "endpoint-" + datetime.datetime.now().strftime("%m%d%H%M%f")
# create an online endpoint
endpoint = ManagedOnlineEndpoint(
name=online_endpoint_name,
description="this is a sample online endpoint",
auth_mode="key",
tags={"foo": "bar"},
)
Note
To create a Kubernetes online endpoint, use the KubernetesOnlineEndpoint class.
To create a deployment for your managed online endpoint, use the ManagedOnlineDeployment class. This class allows users to configure the key aspects of the deployment.
The following table describes the attributes of a deployment:
In this example, we specify the path (where to upload files from) inline. The SDK automatically uploads the files and registers the model and environment. As a best practice for production, you should register the model and environment and specify the registered name and version separately in the codes.
When you create a managed online endpoint in the Azure Machine Learning studio, you must define an initial deployment for the endpoint. Before you can define a deployment, you must have a registered model in your workspace. Let's begin by registering the model to use for the deployment.
Register your model
A model registration is a logical entity in the workspace. This entity can contain a single model file or a directory of multiple files. As a best practice for production, you should register the model and environment. When creating the endpoint and deployment in this article, we'll assume that you've registered the model folder that contains the model.
To register the example model, follow these steps:
In the left navigation bar, select the Models page.
Select Register, and then choose From local files.
Select Unspecified type for the Model type.
Select Browse, and choose Browse folder.
Select the \azureml-examples\cli\endpoints\online\model-1\model folder from the local copy of the repo you cloned or downloaded earlier. When prompted, select Upload and wait for the upload to complete.
Select Next after the folder upload is completed.
Enter a friendly Name for the model. The steps in this article assume the model is named model-1.
Select Next, and then Register to complete registration.
Repeat the previous steps to register a model-2 from the \azureml-examples\cli\endpoints\online\model-2\model folder in the local copy of the repo you cloned or downloaded earlier.
For information on creating an environment in the studio, see Create an environment.
Create a managed online endpoint and the 'blue' deployment
Use the Azure Machine Learning studio to create a managed online endpoint directly in your browser. When you create a managed online endpoint in the studio, you must define an initial deployment. You can't create an empty managed online endpoint.
One way to create a managed online endpoint in the studio is from the Models page. This method also provides an easy way to add a model to an existing managed online deployment. To deploy the model named model-1 that you registered previously in the Register your model section:
In the left navigation bar, select the Models page.
Select the model named model-1 by checking the circle next to its name.
Select Deploy > Real-time endpoint.
This action opens up a window where you can specify details about your endpoint.
Enter an Endpoint name.
Keep the default selections: Managed for the compute type and key-based authentication for the authentication type.
Select Next, until you get to the "Deployment" page. Here, perform the following tasks:
Name the deployment "blue".
Check the box for Enable Application Insights diagnostics and data collection to allow you to view graphs of your endpoint's activities in the studio later.
Select Next to go to the "Environment" page. Here, perform following steps:
In the "Select scoring file and dependencies" box, browse and select the \azureml-examples\cli\endpoints\online\model-1\onlinescoring\score.py file from the repo you cloned or downloaded earlier.
Start typing sklearn in the search box above the list of environments, and select the AzureML-sklearn-0.24 curated environment.
Select Next to go to the "Compute" page. Here, keep the default selection for the virtual machine "Standard_DS3_v2" and change the Instance count to 1.
Select Next, to accept the default traffic allocation (100%) to the blue deployment.
Review your deployment settings and select the Create button.
Alternatively, you can create a managed online endpoint from the Endpoints page in the studio.
In the left navigation bar, select the Endpoints page.
Select + Create.
This action opens up a window for you to specify details about your endpoint and deployment. Enter settings for your endpoint and deployment as described in the previous steps 5-11, accepting defaults until you're prompted to Create the deployment.
Confirm your existing deployment
One way to confirm your existing deployment is to invoke your endpoint so that it can score your model for a given input request. When you invoke your endpoint via the CLI or Python SDK, you can choose to specify the name of the deployment that will receive the incoming traffic.
Note
Unlike the CLI or Python SDK, Azure Machine Learning studio requires you to specify a deployment when you invoke an endpoint.
Invoke endpoint with deployment name
If you invoke the endpoint with the name of the deployment that will receive traffic, Azure Machine Learning will route the endpoint's traffic directly to the specified deployment and return its output. You can use the --deployment-name option for CLI v2, or deployment_name option for SDK v2 to specify the deployment.
Invoke endpoint without specifying deployment
If you invoke the endpoint without specifying the deployment that will receive traffic, Azure Machine Learning will route the endpoint's incoming traffic to the deployment(s) in the endpoint based on traffic control settings.
Traffic control settings allocate specified percentages of incoming traffic to each deployment in the endpoint. For example, if your traffic rules specify that a particular deployment in your endpoint will receive incoming traffic 40% of the time, Azure Machine Learning will route 40% of the endpoint's traffic to that deployment.
Using the MLClient created earlier, we'll get a handle to the endpoint. The endpoint can be invoked using the invoke command with the following parameters:
endpoint_name - Name of the endpoint
request_file - File with request data
deployment_name - Name of the specific deployment to test in an endpoint
# test the blue deployment with some sample data
ml_client.online_endpoints.invoke(
endpoint_name=online_endpoint_name,
deployment_name="blue",
request_file="../model-1/sample-request.json",
)
View managed online endpoints
You can view all your managed online endpoints in the Endpoints page. Go to the endpoint's Details page to find critical information including the endpoint URI, status, testing tools, activity monitors, deployment logs, and sample consumption code:
In the left navigation bar, select Endpoints. Here, you can see a list of all the endpoints in the workspace.
(Optional) Create a Filter on Compute type to show only Managed compute types.
Select an endpoint name to view the endpoint's Details page.
Test the endpoint with sample data
Use the Test tab in the endpoint's details page to test your managed online deployment. Enter sample input and view the results.
Select the Test tab in the endpoint's detail page. The blue deployment is already selected in the dropdown menu.
az ml online-deployment update --name blue --endpoint-name $ENDPOINT_NAME --set instance_count=2
Note
Notice that in the above command we use --set to override the deployment configuration. Alternatively you can update the yaml file and pass it as an input to the update command using the --file input.
Using the MLClient created earlier, we'll get a handle to the deployment. The deployment can be scaled by increasing or decreasing the instance_count.
# Get the details for online endpoint
endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)
# existing traffic details
print(endpoint.traffic)
# Get the scoring URI
print(endpoint.scoring_uri)
Use the following instructions to scale the deployment up or down by adjusting the number of instances:
In the endpoint Details page. Find the card for the blue deployment.
Select the edit icon in the header of the blue deployment's card.
az ml online-deployment create --name green --endpoint-name $ENDPOINT_NAME -f endpoints/online/managed/sample/green-deployment.yml
Since we haven't explicitly allocated any traffic to green, it has zero traffic allocated to it. You can verify that using the command:
az ml online-endpoint show -n $ENDPOINT_NAME --query traffic
Test the new deployment
Though green has 0% of traffic allocated, you can invoke it directly by specifying the --deployment name:
az ml online-endpoint invoke --name $ENDPOINT_NAME --deployment-name green --request-file endpoints/online/model-2/sample-request.json
If you want to use a REST client to invoke the deployment directly without going through traffic rules, set the following HTTP header: azureml-model-deployment: <deployment-name>. The below code snippet uses curl to invoke the deployment directly. The code snippet should work in Unix/WSL environments:
# get the scoring uri
SCORING_URI=$(az ml online-endpoint show -n $ENDPOINT_NAME -o tsv --query scoring_uri)
# use curl to invoke the endpoint
curl --request POST "$SCORING_URI" --header "Authorization: Bearer $ENDPOINT_KEY" --header 'Content-Type: application/json' --header "azureml-model-deployment: green" --data @endpoints/online/model-2/sample-request.json
Create a new deployment for your managed online endpoint and name the deployment green:
# use MLClient to create green deployment
ml_client.begin_create_or_update(green_deployment).result()
Note
If you're creating a deployment for a Kubernetes online endpoint, use the KubernetesOnlineDeployment class and specify a Kubernetes instance type in your Kubernetes cluster.
Test the new deployment
Though green has 0% of traffic allocated, you can still invoke the endpoint and deployment with the json file.
Create a new deployment to add to your managed online endpoint and name the deployment green.
From the Endpoint details page
Select + Add Deployment button in the endpoint "Details" page.
Select Deploy a model.
Select Next to go to the "Model" page and select the model model-2.
Select Next to go to the "Deployment" page and perform the following tasks:
Name the deployment "green".
Enable application insights diagnostics and data collection.
Select Next to go to the "Environment" page. Here, perform following steps:
In the "Select scoring file and dependencies" box, browse and select the \azureml-examples\cli\endpoints\online\model-2\onlinescoring\score.py file from the repo you cloned or downloaded earlier.
Start typing sklearn in the search box above the list of environments, and select the AzureML-sklearn-0.24 curated environment.
Select Next to go to the "Compute" page. Here, keep the default selection for the virtual machine "Standard_DS3_v2" and change the Instance count to 1.
Select Next to go to the "Traffic" page. Here, keep the default traffic allocation to the deployments (100% traffic to "blue" and 0% traffic to "green").
Select Next to review your deployment settings.
Select Create to create the deployment.
Alternatively, you can use the Models page to add a deployment:
In the left navigation bar, select the Models page.
Select a model by checking the circle next to the model name.
Select Deploy > Real-time endpoint.
Choose to deploy to an existing managed online endpoint.
Follow the previous steps 3 to 9 to finish creating the green deployment.
Note
When adding a new deployment to an endpoint, you can adjust the traffic balance between deployments on the "Traffic" page. At this point, though, you should keep the default traffic allocation to the deployments (100% traffic to "blue" and 0% traffic to "green").
Test the new deployment
Though green has 0% of traffic allocated, you can still invoke the endpoint and deployment. Use the Test tab in the endpoint's details page to test your managed online deployment. Enter sample input and view the results.
Select the Test tab in the endpoint's detail page.
Select the green deployment from the dropdown menu.
Once you've tested your green deployment, you can mirror (or copy) a percentage of the live traffic to it. Traffic mirroring (also called shadowing) doesn't change the results returned to clients—requests still flow 100% to the blue deployment. The mirrored percentage of the traffic is copied and submitted to the green deployment so that you can gather metrics and logging without impacting your clients. Mirroring is useful when you want to validate a new deployment without impacting clients. For example, you can use mirroring to check if latency is within acceptable bounds or to check that there are no HTTP errors. Testing the new deployment with traffic mirroring/shadowing is also known as shadow testing. The deployment receiving the mirrored traffic (in this case, the green deployment) can also be called the shadow deployment.
Mirroring has the following limitations:
Mirroring is supported for the CLI (v2) (version 2.4.0 or above) and Python SDK (v2) (version 1.0.0 or above). If you use an older version of CLI/SDK to update an endpoint, you'll lose the mirror traffic setting.
Mirroring isn't currently supported for Kubernetes online endpoints.
You can mirror traffic to only one deployment in an endpoint.
The maximum percentage of traffic you can mirror is 50%. This limit is to reduce the effect on your endpoint bandwidth quota (default 5 MBPS)—your endpoint bandwidth is throttled if you exceed the allocated quota. For information on monitoring bandwidth throttling, see Monitor managed online endpoints.
Also note the following behaviors:
A deployment can be configured to receive only live traffic or mirrored traffic, not both.
When you invoke an endpoint, you can specify the name of any of its deployments — even a shadow deployment — to return the prediction.
When you invoke an endpoint with the name of the deployment that will receive incoming traffic, Azure Machine Learning won't mirror traffic to the shadow deployment. Azure Machine Learning mirrors traffic to the shadow deployment from traffic sent to the endpoint when you don't specify a deployment.
Now, let's set the green deployment to receive 10% of mirrored traffic. Clients will still receive predictions from the blue deployment only.
You can test mirror traffic by invoking the endpoint several times without specifying a deployment to receive the incoming traffic:
# You can test mirror traffic by invoking the endpoint several times
for i in range(20):
ml_client.online_endpoints.invoke(
endpoint_name=online_endpoint_name,
request_file="../model-1/sample-request.json",
)
You can confirm that the specific percentage of the traffic was sent to the green deployment by seeing the logs from the deployment:
To mirror 10% of the traffic to the green deployment:
From the endpoint Details page, Select Update traffic.
Slide the button to Enable mirrored traffic.
Select the green deployment in the "Deployment name" dropdown menu.
Keep the default traffic allocation of 10%.
Select Update.
The endpoint details page now shows mirrored traffic allocation of 10% to the green deployment.
To test mirrored traffic, see the Azure CLI or Python tabs to invoke the endpoint several times. Confirm that the specific percentage of the traffic was sent to the green deployment by seeing the logs from the deployment. You can access the deployment logs from the endpoint's Deployment logs tab. You can also use Metrics and Logs to monitor performance of the mirrored traffic. For more information, see Monitor online endpoints.
After testing, you can disable mirroring:
From the endpoint Details page, Select Update traffic.
Slide the button next to Enable mirrored traffic again to disable mirrored traffic.
Select Update.
Allocate a small percentage of live traffic to the new deployment
Once you're fully satisfied with your green deployment, switch all traffic to it.
In the endpoint Details page, Select Update traffic.
Adjust the deployment traffic by allocating 100% to the green deployment and 0% to the blue deployment.
Select Update.
Remove the old deployment
Use the following steps to delete an individual deployment from a managed online endpoint. Deleting an individual deployment does affect the other deployments in the managed online endpoint:
You cannot delete a deployment that has live traffic allocated to it. You must first set traffic allocation for the deployment to 0% before deleting it.
In the endpoint Details page, find the blue deployment.
Select the delete icon next to the deployment name.
If you aren't going to use the endpoint and deployment, you should delete them. By deleting the endpoint, you'll also delete all its underlying deployments.
az ml online-endpoint delete --name $ENDPOINT_NAME --yes --no-wait
If you aren't going to use the endpoint and deployment, you should delete them. By deleting the endpoint, you'll also delete all its underlying deployments.
If you aren't going to use the endpoint and deployment, you should delete them. By deleting the endpoint, you'll also delete all its underlying deployments.