Schedule machine learning pipeline jobs
APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)
In this article, you'll learn how to programmatically schedule a pipeline to run on Azure and use the schedule UI to do the same. You can create a schedule based on elapsed time. Time-based schedules can be used to take care of routine tasks, such as retrain models or do batch predictions regularly to keep them up-to-date. After learning how to create schedules, you'll learn how to retrieve, update and deactivate them via CLI, SDK, and studio UI.
Prerequisites
- You must have an Azure subscription to use Azure Machine Learning. If you don't have an Azure subscription, create a trial subscription before you begin. Try the trial subscription today.
Install the Azure CLI and the
ml
extension. Follow the installation steps in Install, set up, and use the CLI (v2).Create an Azure Machine Learning workspace if you don't have one. For workspace creation, see Install, set up, and use the CLI (v2).
Schedule a pipeline job
To run a pipeline job on a recurring basis, you'll need to create a schedule. A Schedule
associates a job, and a trigger. The trigger can either be cron
that use cron expression to describe the wait between runs or recurrence
that specify using what frequency to trigger job. In each case, you need to define a pipeline job first, it can be existing pipeline jobs or a pipeline job define inline, refer to Create a pipeline job in CLI and Create a pipeline job in SDK.
You can schedule a pipeline job yaml in local or an existing pipeline job in workspace.
Create a schedule
Create a time-based schedule with recurrence pattern
APPLIES TO: Azure CLI ml extension v2 (current)
$schema: https://azuremlschemas.azureedge.net/latest/schedule.schema.json
name: simple_recurrence_job_schedule
display_name: Simple recurrence job schedule
description: a simple hourly recurrence job schedule
trigger:
type: recurrence
frequency: day #can be minute, hour, day, week, month
interval: 1 #every day
schedule:
hours: [4,5,10,11,12]
minutes: [0,30]
start_time: "2022-07-10T10:00:00" # optional - default will be schedule creation time
time_zone: "Pacific Standard Time" # optional - default will be UTC
create_job: ./simple-pipeline-job.yml
# create_job: azureml:simple-pipeline-job
trigger
contains the following properties:
- (Required)
type
specifies the schedule type isrecurrence
. It can also becron
, see details in the next section.
List continues below.
Note
The following properties that need to be specified apply for CLI and SDK.
(Required)
frequency
specifies the unit of time that describes how often the schedule fires. Can beminute
,hour
,day
,week
,month
.(Required)
interval
specifies how often the schedule fires based on the frequency, which is the number of time units to wait until the schedule fires again.(Optional)
schedule
defines the recurrence pattern, containinghours
,minutes
, andweekdays
.- When
frequency
isday
, pattern can specifyhours
andminutes
. - When
frequency
isweek
andmonth
, pattern can specifyhours
,minutes
andweekdays
. hours
should be an integer or a list, from 0 to 23.minutes
should be an integer or a list, from 0 to 59.weekdays
can be a string or list frommonday
tosunday
.- If
schedule
is omitted, the job(s) will be triggered according to the logic ofstart_time
,frequency
andinterval
.
- When
(Optional)
start_time
describes the start date and time with timezone. Ifstart_time
is omitted, start_time will be equal to the job created time. If the start time is in the past, the first job will run at the next calculated run time.(Optional)
end_time
describes the end date and time with timezone. Ifend_time
is omitted, the schedule will continue trigger jobs until the schedule is manually disabled.(Optional)
time_zone
specifies the time zone of the recurrence. If omitted, by default is UTC. To learn more about timezone values, see appendix for timezone values.
Create a time-based schedule with cron expression
APPLIES TO: Azure CLI ml extension v2 (current)
$schema: https://azuremlschemas.azureedge.net/latest/schedule.schema.json
name: simple_cron_job_schedule
display_name: Simple cron job schedule
description: a simple hourly cron job schedule
trigger:
type: cron
expression: "0 * * * *"
start_time: "2022-07-10T10:00:00" # optional - default will be schedule creation time
time_zone: "Pacific Standard Time" # optional - default will be UTC
# create_job: azureml:simple-pipeline-job
create_job: ./simple-pipeline-job.yml
The trigger
section defines the schedule details and contains following properties:
- (Required)
type
specifies the schedule type iscron
.
List continues below.
(Required)
expression
uses standard crontab expression to express a recurring schedule. A single expression is composed of five space-delimited fields:MINUTES HOURS DAYS MONTHS DAYS-OF-WEEK
A single wildcard (
*
), which covers all values for the field. So a*
in days means all days of a month (which varies with month and year).The
expression: "15 16 * * 1"
in the sample above means the 16:15PM on every Monday.The table below lists the valid values for each field:
Field Range Comment MINUTES
0-59 - HOURS
0-23 - DAYS
- Not supported. The value will be ignored and treat as *
.MONTHS
- Not supported. The value will be ignored and treat as *
.DAYS-OF-WEEK
0-6 Zero (0) means Sunday. Names of days also accepted. To learn more about how to use crontab expression, see Crontab Expression wiki on GitHub .
Important
DAYS
andMONTH
are not supported. If you pass a value, it will be ignored and treat as*
.(Optional)
start_time
specifies the start date and time with timezone of the schedule.start_time: "2022-05-10T10:15:00-04:00"
means the schedule starts from 10:15:00AM on 2022-05-10 in UTC-4 timezone. Ifstart_time
is omitted, thestart_time
will be equal to schedule creation time. If the start time is in the past, the first job will run at the next calculated run time.(Optional)
end_time
describes the end date and time with timezone. Ifend_time
is omitted, the schedule will continue trigger jobs until the schedule is manually disabled.(Optional)
time_zone
specifies the time zone of the expression. If omitted, by default is UTC. See appendix for timezone values.
Limitations:
- Currently Azure Machine Learning v2 schedule doesn't support event-based trigger.
- You can specify complex recurrence pattern containing multiple trigger timestamps using Azure Machine Learning SDK/CLI v2, while UI only displays the complex pattern and doesn't support editing.
- If you set the recurrence as the 31st day of every month, in months with less than 31 days, the schedule won't trigger jobs.
Change runtime settings when defining schedule
When defining a schedule using an existing job, you can change the runtime settings of the job. Using this approach, you can define multi-schedules using the same job with different inputs.
APPLIES TO: Azure CLI ml extension v2 (current)
$schema: https://azuremlschemas.azureedge.net/latest/schedule.schema.json
name: cron_with_settings_job_schedule
display_name: Simple cron job schedule
description: a simple hourly cron job schedule
trigger:
type: cron
expression: "0 * * * *"
start_time: "2022-07-10T10:00:00" # optional - default will be schedule creation time
time_zone: "Pacific Standard Time" # optional - default will be UTC
create_job:
type: pipeline
job: ./simple-pipeline-job.yml
# job: azureml:simple-pipeline-job
# runtime settings
settings:
#default_compute: azureml:cpu-cluster
continue_on_step_failure: true
inputs:
hello_string_top_level_input: ${{name}}
tags:
schedule: cron_with_settings_schedule
Following properties can be changed when defining schedule:
Property | Description |
---|---|
settings | A dictionary of settings to be used when running the pipeline job. |
inputs | A dictionary of inputs to be used when running the pipeline job. |
outputs | A dictionary of inputs to be used when running the pipeline job. |
experiment_name | Experiment name of triggered job. |
Note
Studio UI users can only modify input, output, and runtime settings when creating a schedule. experiment_name
can only be changed using the CLI or SDK.
Expressions supported in schedule
When define schedule, we support following expression that will be resolved to real value during job runtime.
Expression | Description | Supported properties |
---|---|---|
${{creation_context.trigger_time}} |
The time when the schedule is triggered. | String type inputs of pipeline job |
${{name}} |
The name of job. | outputs.path of pipeline job |
Manage schedule
Create schedule
APPLIES TO: Azure CLI ml extension v2 (current)
After you create the schedule yaml, you can use the following command to create a schedule via CLI.
# This action will create related resources for a schedule. It will take dozens of seconds to complete.
az ml schedule create --file cron-schedule.yml --no-wait
List schedules in a workspace
APPLIES TO: Azure CLI ml extension v2 (current)
az ml schedule list
Check schedule detail
APPLIES TO: Azure CLI ml extension v2 (current)
az ml schedule show -n simple_cron_job_schedule
Update a schedule
APPLIES TO: Azure CLI ml extension v2 (current)
az ml schedule update -n simple_cron_schedule --set description="new description" --no-wait
Note
If you would like to update more than just tags/description, it is recomend to use az ml schedule create --file update_schedule.yml
Disable a schedule
APPLIES TO: Azure CLI ml extension v2 (current)
az ml schedule disable -n simple_cron_schedule --no-wait
Enable a schedule
APPLIES TO: Azure CLI ml extension v2 (current)
az ml schedule enable -n simple_cron_schedule --no-wait
Query triggered jobs from a schedule
All the display name of jobs triggered by schedule will have the display name as <schedule_name>-YYYYMMDDThhmmssZ. For example, if a schedule with a name of named-schedule is created with a scheduled run every 12 hours starting at 6 AM on Jan 1 2021, then the display names of the jobs created will be as follows:
- named-schedule-20210101T060000Z
- named-schedule-20210101T180000Z
- named-schedule-20210102T060000Z
- named-schedule-20210102T180000Z, and so on
You can also apply Azure CLI JMESPath query to query the jobs triggered by a schedule name.
# query triggered jobs from schedule, please replace the simple_cron_schedule to your schedule name
az ml job list --query "[?contains(display_name,'simple_cron_schedule')]"
Note
For a simpler way to find all jobs triggered by a schedule, see the Jobs history on the schedule detail page using the studio UI.
Delete a schedule
Important
A schedule must be disabled to be deleted. Delete is an unrecoverable action. After a schedule is deleted, you can never access or recover it.
APPLIES TO: Azure CLI ml extension v2 (current)
az ml schedule delete -n simple_cron_schedule
RBAC (Role-based-access-control) support
Since schedules are usually used for production, to reduce impact of misoperation, workspace admins may want to restrict access to creating and managing schedules within a workspace.
Currently there are three action rules related to schedules and you can configure in Azure portal. You can learn more details about how to manage access to an Azure Machine Learning workspace.
Action | Description | Rule |
---|---|---|
Read | Get and list schedules in Machine Learning workspace | Microsoft.MachineLearningServices/workspaces/schedules/read |
Write | Create, update, disable and enable schedules in Machine Learning workspace | Microsoft.MachineLearningServices/workspaces/schedules/write |
Delete | Delete a schedule in Machine Learning workspace | Microsoft.MachineLearningServices/workspaces/schedules/delete |
Cost considerations
- Schedules are billed based on the number of schedules, each schedule will create a logic apps host Azure Machine Learning subs on behalf (HOBO) of the user.
- The cost of logic apps will change back to the user's Azure subscription, and you can find costs of HOBO resources are billed using the same meter emitted by the original RP. They are shown under the host resource (the workspace).
Frequently asked questions
Why my schedules created by SDK aren't listed in UI?
The schedules UI is for v2 schedules. Hence, your v1 schedules won't be listed or accessed via UI.
However, v2 schedules also support v1 pipeline jobs. You don't have to publish pipeline first, and you can directly set up schedules for a pipeline job.
Why my schedules don't trigger job at the time I set before?
- By default schedules will use UTC timezone to calculate trigger time. You can specify timezone in the creation wizard, or update timezone in schedule detail page.
- If you set the recurrence as the 31st day of every month, in months with less than 31 days, the schedule won't trigger jobs.
- If you're using cron expressions, MONTH isn't supported. If you pass a value, it will be ignored and treated as *. This is a known limitation.
Are event-based schedules supported?
- No, V2 schedule does not support event-based schedules.
Next steps
- Learn more about the CLI (v2) schedule YAML schema.
- Learn how to create pipeline job in CLI v2.
- Learn how to create pipeline job in SDK v2.
- Learn more about CLI (v2) core YAML syntax.
- Learn more about Pipelines.
- Learn more about Component.