Model training on serverless compute

Article
10/21/2024

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

You no longer need to create and manage compute to train your model in a scalable way. Your job can instead be submitted to a new compute target type, called serverless compute. Serverless compute is the easiest way to run training jobs on Azure Machine Learning. Serverless compute is a fully managed, on-demand compute. Azure Machine Learning creates, scales, and manages the compute for you. Through model training with serverless compute, machine learning professionals can focus on their expertise of building machine learning models and not have to learn about compute infrastructure or setting it up.

Machine learning professionals can specify the resources the job needs. Azure Machine Learning manages the compute infrastructure, and provides managed network isolation reducing the burden on you.

Enterprises can also reduce costs by specifying optimal resources for each job. IT Admins can still apply control by specifying cores quota at subscription and workspace level and apply Azure policies.

Serverless compute can be used to fine-tune models in the model catalog such as LLAMA 2. Serverless compute can be used to run all types of jobs from Azure Machine Learning studio, SDK and CLI. Serverless compute can also be used for building environment images and for responsible AI dashboard scenarios. Serverless jobs consume the same quota as Azure Machine Learning compute quota. You can choose standard (dedicated) tier or spot (low-priority) VMs. Managed identity and user identity are supported for serverless jobs. Billing model is the same as Azure Machine Learning compute.

Advantages of serverless compute

Azure Machine Learning manages creating, setting up, scaling, deleting, patching, compute infrastructure reducing management overhead
You don't need to learn about compute, various compute types, and related properties.
There's no need to repeatedly create clusters for each VM size needed, using same settings, and replicating for each workspace.
You can optimize costs by specifying the exact resources each job needs at runtime in terms of instance type (VM size) and instance count. You can monitor the utilization metrics of the job to optimize the resources a job would need.
Reduction in steps involved to run a job
To further simplify job submission, you can skip the resources altogether. Azure Machine Learning defaults the instance count and chooses an instance type (VM size) based on factors like quota, cost, performance and disk size.
Lesser wait times before jobs start executing in some cases.
User identity and workspace user-assigned managed identity is supported for job submission.
With managed network isolation, you can streamline and automate your network isolation configuration. Customer virtual network is also supported
Admin control through quota and Azure policies

How to use serverless compute

You can finetune foundation models such as LLAMA 2 using notebooks as shown below:
- Fine Tune LLAMA 2
- Fine Tune LLAMA 2 using multiple nodes
When you create your own compute cluster, you use its name in the command job, such as compute="cpu-cluster". With serverless, you can skip creation of a compute cluster, and omit the compute parameter to instead use serverless compute. When compute isn't specified for a job, the job runs on serverless compute. Omit the compute name in your CLI or SDK jobs to use serverless compute in the following job types and optionally provide resources a job would need in terms of instance count and instance type:
- Command jobs, including interactive jobs and distributed training
- AutoML jobs
- Sweep jobs
- Parallel jobs
For pipeline jobs through CLI use default_compute: azureml:serverless for pipeline level default compute. For pipelines jobs through SDK use default_compute="serverless". See Pipeline job for an example.
When you submit a training job in studio (preview), select Serverless as the compute type.
When using Azure Machine Learning designer, select Serverless as default compute.
You can use serverless compute for responsible AI dashboard
- AutoML Image Classification scenario with RAI Dashboard

Performance considerations

Serverless compute can help speed up your training in the following ways:

Insufficient quota: When you create your own compute cluster, you're responsible for figuring out what VM size and node count to create. When your job runs, if you don't have sufficient quota for the cluster the job fails. Serverless compute uses information about your quota to select an appropriate VM size by default.

Scale down optimization: When a compute cluster is scaling down, a new job has to wait for scale down to happen and then scale up before job can run. With serverless compute, you don't have to wait for scale down and your job can start running on another cluster/node (assuming you have quota).

Cluster busy optimization: when a job is running on a compute cluster and another job is submitted, your job is queued behind the currently running job. With serverless compute, you get another node/another cluster to start running the job (assuming you have quota).

Quota

When submitting the job, you still need sufficient Azure Machine Learning compute quota to proceed (both workspace and subscription level quota). The default VM size for serverless jobs is selected based on this quota. If you specify your own VM size/family:

If you have some quota for your VM size/family, but not sufficient quota for the number of instances, you see an error. The error recommends decreasing the number of instances to a valid number based on your quota limit or request a quota increase for this VM family or changing the VM size
If you don't have quota for your specified VM size, you see an error. The error recommends selecting a different VM size for which you do have quota or request quota for this VM family
If you do have sufficient quota for VM family to run the serverless job, but other jobs are using the quota, you get a message that your job must wait in a queue until quota is available

When you view your usage and quota in the Azure portal, you see the name "Serverless" to see all the quota consumed by serverless jobs.

Identity support and credential pass through

User credential pass through : Serverless compute fully supports user credential pass through. The user token of the user who is submitting the job is used for storage access. These credentials are from your Microsoft Entra ID.

Python SDK
Azure CLI

from azure.ai.ml import command
from azure.ai.ml import MLClient     # Handle to the workspace
from azure.identity import DefaultAzureCredential     # Authentication package
from azure.ai.ml.entities import ResourceConfiguration
from azure.ai.ml.entities import UserIdentityConfiguration 

credential = DefaultAzureCredential()
# Get a handle to the workspace. You can find the info on the workspace tab on studio.ml.azure.cn
ml_client = MLClient(
    credential=credential,
    subscription_id="<Azure subscription id>", 
    resource_group_name="<Azure resource group>",
    workspace_name="<Azure Machine Learning Workspace>",
)
job = command(
    command="echo 'hello world'",
    environment="azureml://registries/azureml/environments/sklearn-1.5/labels/latest",
        identity=UserIdentityConfiguration(),
)
# submit the command job
ml_client.create_or_update(job)

Create a file named hello.yaml with the following content:

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world"
environment:
  image: library/python:latest
identity:
  type: user_identity

Submit the job with the following command:

az ml job create --file hello.yaml --resource-group my-resource-group --workspace-name my-workspace

The rest of the CLI examples show variations of the hello.yaml file. Run each of them in the same way.

User-assigned managed identity : When you have a workspace configured with user-assigned managed identity, you can use that identity with the serverless job for storage access. To access secrets, see Use authentication credential secrets in Azure Machine Learning jobs.

Python SDK
Azure CLI

from azure.ai.ml import command
from azure.ai.ml import MLClient     # Handle to the workspace
from azure.identity import DefaultAzureCredential    # Authentication package
from azure.ai.ml.entities import ResourceConfiguration
from azure.ai.ml.entities import ManagedIdentityConfiguration

credential = DefaultAzureCredential()
# Get a handle to the workspace. You can find the info on the workspace tab on studio.ml.azure.cn
ml_client = MLClient(
    credential=credential,
    subscription_id="<Azure subscription id>", 
    resource_group_name="<Azure resource group>",
    workspace_name="<Azure Machine Learning Workspace>",
)
job = command(
    command="echo 'hello world'",
    environment="azureml://registries/azureml/environments/sklearn-1.5/labels/latest",
        identity= ManagedIdentityConfiguration(),
)
# submit the command job
ml_client.create_or_update(job)

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world"
environment:
  image: library/python:latest
identity:
  type: managed

For information on attaching user-assigned managed identity, see attach user assigned managed identity.

Configure properties for command jobs

If no compute target is specified for command, sweep, and AutoML jobs then the compute defaults to serverless compute. For instance, for this command job:

Python SDK
Azure CLI

from azure.ai.ml import command
from azure.ai.ml import command 
from azure.ai.ml import MLClient # Handle to the workspace
from azure.identity import DefaultAzureCredential # Authentication package

credential = DefaultAzureCredential()
# Get a handle to the workspace. You can find the info on the workspace tab on studio.ml.azure.cn
ml_client = MLClient(
    credential=credential,
    subscription_id="<Azure subscription id>", 
    resource_group_name="<Azure resource group>",
    workspace_name="<Azure Machine Learning Workspace>",
)
job = command(
    command="echo 'hello world'",
    environment="azureml://registries/azureml/environments/sklearn-1.5/labels/latest",
)
# submit the command job
ml_client.create_or_update(job)

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world"
environment:
  image: library/python:latest

The compute defaults to serverless compute with:

Single node for this job. The default number of nodes is based on the type of job. See following sections for other job types.
CPU virtual machine, which is determined based on quota, performance, cost, and disk size.
Dedicated virtual machines
Workspace location

You can override these defaults. If you want to specify the VM type or number of nodes for serverless compute, add resources to your job:

instance_type to choose a specific VM. Use this parameter if you want a specific CPU/GPU VM size

instance_count to specify the number of nodes.

Python SDK
Azure CLI

from azure.ai.ml import command 
from azure.ai.ml import MLClient # Handle to the workspace
from azure.identity import DefaultAzureCredential # Authentication package
from azure.ai.ml.entities import JobResourceConfiguration 

credential = DefaultAzureCredential()
# Get a handle to the workspace. You can find the info on the workspace tab on studio.ml.azure.cn
ml_client = MLClient(
    credential=credential,
    subscription_id="<Azure subscription id>", 
    resource_group_name="<Azure resource group>",
    workspace_name="<Azure Machine Learning Workspace>",
)
job = command(
    command="echo 'hello world'",
    environment="azureml://registries/azureml/environments/sklearn-1.5/labels/latest",
    resources = JobResourceConfiguration(instance_type="Standard_NC24", instance_count=4)
)
# submit the command job
ml_client.create_or_update(job)

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world"
environment:
  image: library/python:latest
resources:
  instance_count: 4
  instance_type: Standard_NC24

To change job tier, use queue_settings to choose between Dedicated VMs (job_tier: Standard) and Low priority(jobtier: Spot).

Python SDK
Azure CLI

from azure.ai.ml import command
from azure.ai.ml import MLClient    # Handle to the workspace
from azure.identity import DefaultAzureCredential    # Authentication package
credential = DefaultAzureCredential()
# Get a handle to the workspace. You can find the info on the workspace tab on studio.ml.azure.cn
ml_client = MLClient(
    credential=credential,
    subscription_id="<Azure subscription id>", 
    resource_group_name="<Azure resource group>",
    workspace_name="<Azure Machine Learning Workspace>",
)
job = command(
    command="echo 'hello world'",
    environment="azureml://registries/azureml/environments/sklearn-1.5/labels/latest",
    queue_settings={
      "job_tier": "spot"  
    }
)
# submit the command job
ml_client.create_or_update(job)

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
component: ./train.yml 
queue_settings:
   job_tier: Standard #Possible Values are Standard (dedicated), Spot (low priority). Default is Standard.

Example for all fields with command jobs

Here's an example of all fields specified including identity the job should use. There's no need to specify virtual network settings as workspace level managed network isolation is automatically used.

Python SDK
Azure CLI

from azure.ai.ml import command
from azure.ai.ml import MLClient      # Handle to the workspace
from azure.identity import DefaultAzureCredential     # Authentication package
from azure.ai.ml.entities import ResourceConfiguration
from azure.ai.ml.entities import UserIdentityConfiguration 

credential = DefaultAzureCredential()
# Get a handle to the workspace. You can find the info on the workspace tab on studio.ml.azure.cn
ml_client = MLClient(
    credential=credential,
    subscription_id="<Azure subscription id>", 
    resource_group_name="<Azure resource group>",
    workspace_name="<Azure Machine Learning Workspace>",
)
job = command(
    command="echo 'hello world'",
    environment="azureml://registries/azureml/environments/sklearn-1.5/labels/latest",
         identity=UserIdentityConfiguration(),
    queue_settings={
      "job_tier": "Standard"  
    }
)
job.resources = ResourceConfiguration(instance_type="Standard_E4s_v3", instance_count=1)
# submit the command job
ml_client.create_or_update(job)

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: echo "hello world"
environment:
  image: library/python:latest
queue_settings:
   job_tier: Standard #Possible Values are Standard, Spot. Default is Standard.
identity:
  type: user_identity #Possible values are Managed, user_identity
resources:
  instance_count: 1
  instance_type: Standard_E4s_v3

View more examples of training with serverless compute at:-

AutoML job

There's no need to specify compute for AutoML jobs. Resources can be optionally specified. If instance count isn't specified, then it's defaulted based on max_concurrent_trials and max_nodes parameters. If you submit an AutoML image classification or NLP task with no instance type, the GPU VM size is automatically selected. It's possible to submit AutoML job through CLIs, SDK, or Studio. To submit AutoML jobs with serverless compute in studio first enable the submit a training job in studio (preview) feature in the preview panel.

Python SDK
Azure CLI

If you want to specify the type or instance count, use the ResourceConfiguration class.

# Create the AutoML classification job with the related factory-function.
from azure.ai.ml.entities import ResourceConfiguration 

classification_job = automl.classification(
    experiment_name=exp_name,
    training_data=my_training_data_input,
    target_column_name="y",
    primary_metric="accuracy",
    n_cross_validations=5,
    enable_model_explainability=True,
    tags={"my_custom_tag": "My custom value"},
)

# Limits are all optional
classification_job.set_limits(
    timeout_minutes=600,
    trial_timeout_minutes=20,
    max_trials=max_trials,
    # max_concurrent_trials = 4,
    # max_cores_per_trial: -1,
    enable_early_termination=True,
)

# Training properties are optional
classification_job.set_training(
    blocked_training_algorithms=[ClassificationModels.LOGISTIC_REGRESSION],
    enable_onnx_compatible_models=True,
)

# Serverless compute resources used to run the job
classification_job.resources = 
ResourceConfiguration(instance_type="Standard_E4s_v3", instance_count=6)

If you want to specify the type or instance count, add a resources section.

$schema: https://azuremlsdk2.blob.core.windows.net/preview/0.0.1/autoMLJob.schema.json
type: automl
experiment_name: dpv2-cli-automl-classifier-experiment
description: A Classification job using bank marketing
# Serverless compute is used to run this AutoML job. 
# Through serverless compute, Azure Machine Learning takes care of creating, scaling, deleting, patching and managing compute, along with providing managed network isolation, reducing the burden on you.

task: classification
log_verbosity: debug
primary_metric: accuracy

target_column_name: "y"

#validation_data_size: 0.20
#n_cross_validations: 5
#test_data_size: 0.1

training_data:
  path: "./training-mltable-folder"
  type: mltable
validation_data:
  path: "./validation-mltable-folder"
  type: mltable
test_data:
  path: "./test-mltable-folder"
  type: mltable

limits:
  timeout_minutes: 180
  max_trials: 40
  max_concurrent_trials: 5
  trial_timeout_minutes: 20
  enable_early_termination: true
  exit_score: 0.92

featurization:
  mode: custom
  transformer_params:
    imputer:
      - fields: ["job"]
        parameters:
          strategy: most_frequent
  blocked_transformers:
    - WordEmbedding
training:
  enable_model_explainability: true
  allowed_training_algorithms:
    - gradient_boosting
    - logistic_regression
# Resources to run this serverless job
resources:
  instance_type="Standard_E4s_v3"
  instance_count=5

For a pipeline job, specify "serverless" as your default compute type to use serverless compute.

# Construct pipeline
@pipeline()
def pipeline_with_components_from_yaml(
    training_input,
    test_input,
    training_max_epochs=20,
    training_learning_rate=1.8,
    learning_rate_schedule="time-based",
):
    """E2E dummy train-score-eval pipeline with components defined via yaml."""
    # Call component obj as function: apply given inputs & parameters to create a node in pipeline
    train_with_sample_data = train_model(
        training_data=training_input,
        max_epochs=training_max_epochs,
        learning_rate=training_learning_rate,
        learning_rate_schedule=learning_rate_schedule,
    )

    score_with_sample_data = score_data(
        model_input=train_with_sample_data.outputs.model_output, test_data=test_input
    )
    score_with_sample_data.outputs.score_output.mode = "upload"

    eval_with_sample_data = eval_model(
        scoring_result=score_with_sample_data.outputs.score_output
    )

    # Return: pipeline outputs
    return {
        "trained_model": train_with_sample_data.outputs.model_output,
        "scored_data": score_with_sample_data.outputs.score_output,
        "evaluation_report": eval_with_sample_data.outputs.eval_output,
    }


pipeline_job = pipeline_with_components_from_yaml(
    training_input=Input(type="uri_folder", path=parent_dir + "/data/"),
    test_input=Input(type="uri_folder", path=parent_dir + "/data/"),
    training_max_epochs=20,
    training_learning_rate=1.8,
    learning_rate_schedule="time-based",
)

# set pipeline to use serverless compute
pipeline_job.settings.default_compute = "serverless"

For a pipeline job, specify azureml:serverless as your default compute type to use serverless compute.

$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
display_name: 1b_e2e_registered_components
description: E2E dummy train-score-eval pipeline with registered components
# Serverless compute is used to run this pipeline job. 
# Through serverless compute, Azure Machine Learning takes care of creating, scaling, deleting, patching and managing compute, along with providing managed network isolation, reducing the burden on you.
inputs:
  pipeline_job_training_max_epocs: 20
  pipeline_job_training_learning_rate: 1.8
  pipeline_job_learning_rate_schedule: 'time-based'

outputs: 
  pipeline_job_trained_model:
    mode: upload
  pipeline_job_scored_data:
    mode: upload
  pipeline_job_evaluation_report:
    mode: upload

settings:
 default_compute: azureml:serverless

jobs:
  train_job:
    type: command
    component: azureml:my_train@latest
    inputs:
      training_data: 
        type: uri_folder 
        path: ./data      
      max_epocs: ${{parent.inputs.pipeline_job_training_max_epocs}}
      learning_rate: ${{parent.inputs.pipeline_job_training_learning_rate}}
      learning_rate_schedule: ${{parent.inputs.pipeline_job_learning_rate_schedule}}
    outputs:
      model_output: ${{parent.outputs.pipeline_job_trained_model}}
    services:
      my_vscode:
        type: vs_code
      my_jupyter_lab:
        type: jupyter_lab
      my_tensorboard:
        type: tensor_board
        log_dir: "outputs/tblogs"
    #  my_ssh:
    #    type: tensor_board
    #    ssh_public_keys: <paste the entire pub key content>
    #    nodes: all # Use the `nodes` property to pick which node you want to enable interactive services on. If `nodes` are not selected, by default, interactive applications are only enabled on the head node.

  score_job:
    type: command
    component: azureml:my_score@latest
    inputs:
      model_input: ${{parent.jobs.train_job.outputs.model_output}}
      test_data: 
        type: uri_folder 
        path: ./data
    outputs:
      score_output: ${{parent.outputs.pipeline_job_scored_data}}

  evaluate_job:
    type: command
    component: azureml:my_eval@latest
    inputs:
      scoring_result: ${{parent.jobs.score_job.outputs.score_output}}
    outputs:
      eval_output: ${{parent.outputs.pipeline_job_evaluation_report}}

You can also set serverless compute as the default compute in Designer.

Next steps