Upgrade deployment endpoints to SDK v2

Article
08/11/2023

With SDK/CLI v1, you can deploy models on ACI or AKS as web services. Your existing v1 model deployments and web services will continue to function as they are, but Using SDK/CLI v1 to deploy models on ACI or AKS as web services is now considered as legacy. For new model deployments, we recommend upgrading to v2.

In v2, we offer managed endpoints or Kubernetes endpoints. For a comparison of v1 and v2, see Endpoints and deployment.

There are several deployment funnels such as managed online endpoints, kubernetes online endpoints (including Azure Kubernetes Services) in v2, and Azure Container Instances (ACI) and Kubernetes Services (AKS) webservices in v1. In this article, we'll focus on the comparison of deploying to ACI webservices (v1) and managed online endpoints (v2).

Examples in this article show how to:

Deploy your model to Azure
Score using the endpoint
Delete the webservice/endpoint

Create inference resources

SDK v1

Configure a model, an environment, and a scoring script:

# configure a model. example for registering a model 
from azureml.core.model import Model
model = Model.register(ws, model_name="bidaf_onnx", model_path="./model.onnx")

# configure an environment
from azureml.core import Environment
env = Environment(name='myenv')
python_packages = ['nltk', 'numpy', 'onnxruntime']
for package in python_packages:
    env.python.conda_dependencies.add_pip_package(package)

# configure an inference configuration with a scoring script
from azureml.core.model import InferenceConfig
inference_config = InferenceConfig(
    environment=env,
    source_directory="./source_dir",
    entry_script="./score.py",
)

Configure and deploy an ACI webservice:

from azureml.core.webservice import AciWebservice

# defince compute resources for ACI
deployment_config = AciWebservice.deploy_configuration(
    cpu_cores=0.5, memory_gb=1, auth_enabled=True
)

# define an ACI webservice
service = Model.deploy(
    ws,
    "myservice",
    [model],
    inference_config,
    deployment_config,
    overwrite=True,
)

# create the service 
service.wait_for_deployment(show_output=True)

For more information on registering models, see Register a model from a local file.

SDK v2

Configure a model, an environment, and a scoring script:

from azure.ai.ml.entities import Model
# configure a model
model = Model(path="../model-1/model/sklearn_regression_model.pkl")

# configure an environment
from azure.ai.ml.entities import Environment
env = Environment(
    conda_file="../model-1/environment/conda.yml",
    image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1",
)

# configure an inference configuration with a scoring script
from azure.ai.ml.entities import CodeConfiguration
code_config = CodeConfiguration(
        code="../model-1/onlinescoring", scoring_script="score.py"
    )

Configure and create an online endpoint:

import datetime
from azure.ai.ml.entities import ManagedOnlineEndpoint

# create a unique endpoint name with current datetime to avoid conflicts
online_endpoint_name = "endpoint-" + datetime.datetime.now().strftime("%m%d%H%M%f")

# define an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="this is a sample online endpoint",
    auth_mode="key",
    tags={"foo": "bar"},
)

# create the endpoint:
ml_client.begin_create_or_update(endpoint)

Configure and create an online deployment:

from azure.ai.ml.entities import ManagedOnlineDeployment

# define a deployment
blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name=online_endpoint_name,
    model=model,
    environment=env,
    code_configuration=code_config,
    instance_type="Standard_F2s_v2",
    instance_count=1,
)

# create the deployment:
ml_client.begin_create_or_update(blue_deployment)

# blue deployment takes 100 traffic
endpoint.traffic = {"blue": 100}
ml_client.begin_create_or_update(endpoint)

For more information on concepts for endpoints and deployments, see What are online endpoints?

Submit a request

SDK v1

import json
data = {
    "query": "What color is the fox",
    "context": "The quick brown fox jumped over the lazy dog.",
}
data = json.dumps(data)
predictions = service.run(input_data=data)
print(predictions)

SDK v2

# test the endpoint (the request will route to blue deployment as set above)
ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    request_file="../model-1/sample-request.json",
)

# test the specific (blue) deployment
ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="blue",
    request_file="../model-1/sample-request.json",
)

Delete resources

SDK v1
```
service.delete()
```

SDK v2

ml_client.online_endpoints.begin_delete(name=online_endpoint_name)

Mapping of key functionality in SDK v1 and SDK v2

Functionality in SDK v1	Rough mapping in SDK v2
azureml.core.model.Model class	azure.ai.ml.entities.Model class
azureml.core.Environment class	azure.ai.ml.entities.Environment class
azureml.core.model.InferenceConfig class	azure.ai.ml.entities.CodeConfiguration class
azureml.core.webservice.AciWebservice class	azure.ai.ml.entities.OnlineDeployment class (and azure.ai.ml.entities.ManagedOnlineEndpoint class)
Model.deploy or Webservice.deploy	ml_client.begin_create_or_update(online_deployment)
Webservice.run	ml_client.online_endpoints.invoke
Webservice.delete	ml_client.online_endpoints.delete

For more information, see

v2 docs:

v1 docs:

Upgrade deployment endpoints to SDK v2

Create inference resources

Submit a request

Delete resources

Mapping of key functionality in SDK v1 and SDK v2

Related documents

Additional resources