Create and run machine learning pipelines by using components in Azure Machine Learning studio

APPLIES TO: Azure CLI ml extension v2 (current)

In this article, you learn how to create and run machine learning pipelines by using the Azure Machine Learning studio and components. You can create pipelines without using components, but components offer better flexibility and reuse. Azure Machine Learning Pipelines can be defined in YAML and run from the Azure CLI, authored in Python, or composed in Azure Machine Learning studio Designer with a drag-and-drop UI. This article focuses on the Azure Machine Learning studio Designer UI.

Prerequisites

Note

Designer supports two types of components, classic prebuilt components(v1) and custom components (v2). These two types of components are NOT compatible.

Classic prebuilt components provide prebuilt components mainly for data processing and traditional machine learning tasks like regression and classification. Classic prebuilt components continue to be supported but won't have any new components added. Also, deployment of classic prebuilt (v1) components doesn't support managed online endpoints (v2).

Custom components allow you to wrap your own code as a component. It supports sharing components across workspaces and seamless authoring across studio, CLI v2, and SDK v2 interfaces.

For new projects, we highly recommend that you use custom components, which is compatible with Azure Machine Learning V2 and receives new updates.

This article applies to custom components.

Register a component in your workspace

To build a pipeline using components in the Designer UI, you need to first register components to your workspace. You can use the UI, Azure CLI, or the SDK to register components to your workspace, so that you can share and reuse the component in the workspace. Registered components support automatic versioning so you can update the component but assure that pipelines that require an older version continue to work.

The following example uses the UI to register components. The component source files are in the cli/jobs/pipelines-with-components/basics/1b_e2e_registered_components directory of the azureml-examples repository. You need to clone the repository.

  1. In your Azure Machine Learning workspace, navigate to Components page and select New Component. The Components page appearance differs depending on whether you previously created components.

    Screenshot showing register entry button in component page.

    Screenshot showing register entry button in component page with can include archive.

    This example uses train.yml in the 1b_e2e_registered_components directory. The YAML file defines the name, type, interface including inputs and outputs, code, environment, and command of this component. The code of this component (train.py) is in ./train_src folder. That code describes the execution logic of this component. To learn more about the component schema, see the command component YAML schema reference.

    Note

    For register components in the UI, code defined in the component YAML file can only point to the current folder where YAML file locates or the subfolders. Because the UI can't recognize the parent directory, you can't specify ../.

    additional_includes can only point to the current folder or subfolder.

    Currently, the UI only supports registering components with command type.

  2. Select Folder, then browse to the 1b_e2e_registered_components folder to upload.

  3. Select train.yml from the Yaml file name.

    Screenshot showing upload from local folder.

  4. Select Next, then confirm the details of this component. After you confirm, select Create to finish the registration process.

  5. Repeat the previous steps to register Score and Eval component using score.yml and eval.yml.

  6. After registering the three components successfully, you can see your components in the studio UI.

Screenshot showing registered component in component page.

Create pipeline by using registered component

  1. Create a new pipeline in Designer. Select the Custom option.

    Screenshot showing creating new pipeline in the Designer homepage.

  2. Select the pencil icon to give the pipeline a meaningful name.

    Screenshot showing rename the pipeline.

  3. In the Designer asset library, you can see Data, Model, and Components tabs. Select Components. You can see the components registered from previous section. If there are too many components, you can search with the component name.

    Screenshot showing registered component in asset library.

    Find the train, score, and eval components registered in previous section then drag them to the canvas. By default, Designer uses the default version of the component. To change to a specific version, double-click the component to open the component pane.

    Screenshot showing changing version of component.

  4. In this example, use the sample data in the data folder. To register the data into your workspace, select the add icon in the asset library, then follow the wizard to register the data. The data type needs to be uri_folder to align with the train component definition.

    Screenshot showing add data.

  5. Drag the data into the canvas. Your pipeline should look like the following screenshot.

    Screenshot showing the pipeline draft.

  6. Connect the data and components by dragging connections in the canvas.

    Animation showing connecting the pipeline.

  7. Double click one component, you'll see a right pane where you can configure the component.

    Screenshot showing component parameter settings.

    For components with primitive type inputs like number, integer, string, and boolean, you can change values of such inputs in the component detailed pane, under Inputs section.

    You can also change the output settings (where to store the component's output) and run settings (compute target to run this component) in the right pane.

  8. Promote the max_epocs input of the train component to pipeline level input. By doing so, you can assign a different value to this input every time before submitting the pipeline.

    Screenshot showing how to promote component input to pipeline input.

Note

Custom components and the Designer classic prebuilt components can't be used together.

Submit pipeline

  1. To submit the pipeline, select Configure & Submit.

    Screenshot showing configure and submit button.

  2. Then you'll see a step-by-step wizard. Follow the wizard to submit the pipeline job.

Screenshot showing submission wizard.

In Basics step, you can configure the experiment, job display name, job description etc.

In Inputs & Outputs step, you can configure the Inputs/Outputs that are promoted to pipeline level. In previous step, we promoted the max_epocs of train component to pipeline input, so you should be able to see and assign value to max_epocs here.

In Runtime settings, you can configure the default datastore and default compute of the pipeline. It's the default datastore/compute for all components in the pipeline. But note if you set a different compute or datastore for a component explicitly, the system respects the component level setting. Otherwise, it uses the pipeline default value.

The Review + Submit step is the last step to review all configurations before submit. The wizard remembers your last time's configuration if you ever submit the pipeline.

After submitting the pipeline job, there will be a message on the top with a link to the job detail. You can select this link to review the job details.

Screenshot showing submission message.

Specify identity in pipeline job

When you submit a pipeline job, you can specify the identity to access the data under Run settings. The default identity is AMLToken, which doesn't use any identity. A pipeline can also support UserIdentity and Managed. For UserIdentity, the identity of job submitter is used to access input data and write the result to the output folder. If you specify Managed, the system uses the managed identity to access the input data and write the result to the output folder.

Screenshot showing how to set identity in pipeline job.