Quickstart: Create a datastore in Azure Machine Learning through the UI to link a lakehouse table

Existing solutions can link an Azure Machine Learning resource to OneLake, extract the data, and create a datastore in Azure Machine Learning. However, in those solutions, the OneLake data is of type "Files." Those solutions don't work for OneLake table-type data, as shown in the following screenshot:

Screenshot showing a table in Microsoft Fabric.

Additionally, some customers might prefer to build the link in the UI. A solution that links Azure Machine Learning resources to OneLake tables is needed. In this article, you learn how to link OneLake tables to Azure Machine Learning studio resources through the UI.

Prerequisites

Solution structure

This solution has three parts. First, create and set up a Data Lake Storage account in the Azure portal. Next, copy the data from OneLake to Azure Data Lake Storage. Bring the data to the Azure Machine Learning resource, and lastly, create the datastore. The following screenshot shows the overall flow of the solution:

Screenshot showing the overall flow of the solution.

Set up the Data Lake storage account in the Azure portal

Assign the Storage Blob Data Contributor and Storage File Data Privileged Contributor roles to the user identity or service principal, to enable key access and creating container permissions. To assign appropriate roles to the user identity:

  1. Open the Azure portal

  2. Select the Storage accounts service.

    Screenshot showing selection of Storage Accounts service.

  3. On the Storage accounts page, select the Data Lake Storage account you created in the prerequisite step. A page showing the storage account properties opens.

    Screenshot showing the properties page of the data lake storage account.

  4. Select the Access keys from the left panel and record the key. This value is required in a later step.

  5. Select and enable Allow storage account key access as shown in the following screenshot:

    Screenshot showing how to enable key access of data lake storage account in Azure portal.

  6. Select Access Control (IAM) from left panel, and assign the Storage Blob Data Contributor and Storage File Data Privileged Contributor roles to the service principal.

    Screenshot showing how to assign roles of data lake storage account in Azure portal.

  7. Create a container in the storage account. Name it onelake-table.

    Screenshot showing creation of a data lake storage account container in the Azure portal.

Use a Fabric data pipeline to copy data to an Azure Data Lake Storage account

  1. At the Fabric portal, select Data pipeline at the New item page.

    Screenshot showing selection of data pipeline at the Fabric New item page.

  2. Select Copy data assistant.

    Screenshot showing selection of Copy data assistant.

  3. In Copy data assistant, select Azure Blobs:

    Screenshot showing selection of Select Azure blobs in the Fabric Copy data assistant.

  4. To create a connection to the Azure Data Lake storage account, select Authentication kind: Account key and then Next:

    Screenshot that shows how to create a connection in a Fabric data pipeline.

  5. Select the data destination, and select Next:

    Screenshot that shows selection of the data destination.

  6. Connect to the data destination, and select Next:

    Screenshot that shows connection to the data destination.

  7. That step automatically starts the data copy job:

    Screenshot that shows the copy activity is scheduled.

    This step might take a while. It directly leads to the next step.

  8. Check that the data copy job finished successfully:

    Screenshot showing that the copy operation succeeded.

Create datastore in Azure Machine Learning linking to Azure Data Lake Storage container

Now that your data is in the Azure Data Lake storage resource, you can create an Azure Machine Learning datastore.

  1. In Azure storage account, the container as shown on the left has data, as shown on the right:

    Screenshot that shows how to verify the data in Azure storage account container.

  2. In Machine Learning studio create data asset, select the File (uri_file) type:

    Screenshot showing selection of the File (uri_file) type.

  3. Select From Azure storage:

    Screenshot that shows how to select Azure storage.

  4. Using the Account key value from the earlier Create a connection to the Azure Data Lake storage account step, create a New datastore:

    Screenshot that shows how to create new datastore in Azure Machine Learning.

  5. You can also directly create a datastore in the Azure Machine Learning Studio:

    Screenshot that shows how to create a datastore in Azure Machine Learning.

  6. You can review details of the datastore you created:

    Screenshot that shows details of the datastore you created.

  7. Review the data in the datastore

    Screenshot that shows how to access a datastore in Azure Machine Learning.

Now that you successfully created the datastore in Azure Machine Learning, you can use it in machine learning exercises.

References