创建数据存储

适用范围：Azure CLI ml 扩展 v2（最新版）Python SDK azure-ai-ml v2（最新版）

本文介绍如何使用 Azure 机器学习数据存储连接到 Azure 数据存储服务。

先决条件

Azure 订阅。如果没有 Azure 订阅，可在开始前创建一个试用帐户。尝试试用版订阅。
适用于 Python 的 Azure 机器学习 SDK。
机器学习工作区。
在工作区中创建数据存储库以及访问存储帐户所需的权限，包括工作区 贡献者 和存储 Blob 数据贡献者权限。
- 工作区角色： Azure 机器学习中的访问控制
- 存储角色：使用 Microsoft Entra ID 授权访问 blob 数据

注意

机器学习数据存储不会创建基础存储帐户资源。相反，它们链接现有存储帐户以供机器学习使用。不需要机器学习数据存储。如果有权访问基础数据，可以直接使用存储 URI。

验证访问权限（Python）

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient.from_config(credential=DefaultAzureCredential())

for datastore in ml_client.datastores.list():
    print(datastore.name)

列出工作区中的数据存储，以确认身份验证和访问。

参考： MLClientDatastoreOperations.list

创建 Azure Blob 数据存储

from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient.from_config(credential=DefaultAzureCredential())

store = AzureBlobDatastore(
    name="",
    description="",
    account_name="",
    container_name=""
)

ml_client.create_or_update(store)

使用基于身份认证的访问创建或更新指向指定 Blob 容器的数据存储库。

参考： AzureBlobDatastoreMLClient

from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml.entities import AccountKeyConfiguration
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient.from_config(credential=DefaultAzureCredential())

store = AzureBlobDatastore(
    name="blob_protocol_example",
    description="Datastore pointing to a blob container using HTTPS.",
    account_name="mytestblobstore",
    container_name="data-container",
    protocol="https",
    credentials=AccountKeyConfiguration(
        account_key="aaaaaaaa-0b0b-1c1c-2d2d-333333333333"
    ),
)

ml_client.create_or_update(store)

使用帐户密钥创建或更新指向指定 Blob 容器的数据存储。

参考： AzureBlobDatastore，AccountKeyConfiguration，MLClient

from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml.entities import SasTokenConfiguration
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient.from_config(credential=DefaultAzureCredential())

store = AzureBlobDatastore(
    name="blob_sas_example",
    description="Datastore pointing to a blob container using a SAS token.",
    account_name="mytestblobstore",
    container_name="data-container",
    credentials=SasTokenConfiguration(
        sas_token= "?xx=A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u&xx=C2dE3fH4iJ5kL6mN7oP8qR9sT0uV1wx&xx=Ff6Gg~7Hh8.-Ii9Jj0Kk1Ll2Mm3Nn4_Oo5Pp6Qq7&xx=N7oP8qR9sT0uV1wX2yZ3aB4cD5eF6g&xxx=Ee5Ff~6Gg7.-Hh8Ii9Jj0Kk1Ll2Mm3_Nn4Oo5Pp6&xxx=C2dE3fH4iJ5kL6mN7oP8qR9sT0uV1w"
    ),
)

ml_client.create_or_update(store)

使用 SAS 令牌创建或更新指向指定 Blob 容器的数据存储。

参考： AzureBlobDatastore，SasTokenConfiguration，MLClient

创建以下 YAML 文件（更新适当的值）：

# my_blob_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: my_blob_ds # add your datastore name here
type: azure_blob
description: here is a description # add a datastore description here
account_name: my_account_name # add the storage account name here
container_name: my_container_name # add the storage container name here

定义 YAML 文件中的 Blob 数据存储配置。提供数据存储名称、存储帐户名称和容器名称。

参考： azureBlob.schema.json

在 Azure CLI 中创建机器学习数据存储：

az ml datastore create --file my_blob_datastore.yml

使用 YAML 文件在工作区中创建或更新 blob 数据存储。

参考：az ml datastore create

创建此 YAML 文件（更新适当的值）：

# my_blob_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: blob_example
type: azure_blob
description: Datastore pointing to a blob container.
account_name: mytestblobstore
container_name: data-container
credentials:
  account_key: aaaaaaaa-0b0b-1c1c-2d2d-333333333333

定义使用帐户密钥进行访问的 Blob 数据存储配置。

参考： azureBlob.schema.json

在 CLI 中创建机器学习数据存储：

az ml datastore create --file my_blob_datastore.yml

使用 YAML 文件在工作区中创建或更新 blob 数据存储。

参考：az ml datastore create

创建此 YAML 文件（更新适当的值）：

# my_blob_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: blob_sas_example
type: azure_blob
description: Datastore pointing to a blob container using a SAS token.
account_name: mytestblobstore
container_name: data-container
credentials:
  sas_token: "?xx=A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u&xx=C2dE3fH4iJ5kL6mN7oP8qR9sT0uV1wx&xx=Ff6Gg~7Hh8.-Ii9Jj0Kk1Ll2Mm3Nn4_Oo5Pp6Qq7&xx=N7oP8qR9sT0uV1wX2yZ3aB4cD5eF6g&xxx=Ee5Ff~6Gg7.-Hh8Ii9Jj0Kk1Ll2Mm3_Nn4Oo5Pp6&xxx=C2dE3fH4iJ5kL6mN7oP8qR9sT0uV1w"

定义使用 SAS 令牌进行访问的 Blob 数据存储配置。

参考： azureBlob.schema.json

在 CLI 中创建机器学习数据存储：

az ml datastore create --file my_blob_datastore.yml

使用 YAML 文件在工作区中创建或更新 blob 数据存储。

参考：`az ml datastore create`

创建 Azure Data Lake Storage Gen2 数据存储

from azure.ai.ml.entities import AzureDataLakeGen2Datastore
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient.from_config(credential=DefaultAzureCredential())

store = AzureDataLakeGen2Datastore(
    name="",
    description="",
    account_name="",
    filesystem=""
)

ml_client.create_or_update(store)

使用基于身份的访问创建或更新指向指定 ADLS Gen2 文件系统的数据存储。

参考： AzureDataLakeGen2DatastoreMLClient

from azure.ai.ml.entities import AzureDataLakeGen2Datastore
from azure.ai.ml.entities._datastore.credentials import ServicePrincipalCredentials

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient.from_config(credential=DefaultAzureCredential())

store = AzureDataLakeGen2Datastore(
    name="adls_gen2_example",
    description="Datastore pointing to an Azure Data Lake Storage Gen2 instance.",
    account_name="mytestdatalakegen2",
    filesystem="my-gen2-container",
     credentials=ServicePrincipalCredentials(
        tenant_id= "bbbbcccc-1111-dddd-2222-eeee3333ffff",
        client_id= "44445555-eeee-6666-ffff-7777aaaa8888",
        client_secret= "Cc3Dd~4Ee5.-Ff6Gg7Hh8Ii9Jj0Kk1_Ll2Mm3Nn4",
    ),
)

ml_client.create_or_update(store)

使用服务主体创建或更新一个数据存储，以指向指定的 ADLS Gen2 文件系统。

参考： AzureDataLakeGen2DatastoreMLClient

创建此 YAML 文件（更新值）：

# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen2.schema.json
name: adls_gen2_credless_example
type: azure_data_lake_gen2
description: Datastore pointing to an Azure Data Lake Storage Gen2 instance.
account_name: mytestdatalakegen2
filesystem: my-gen2-container

定义 YAML 文件中的 ADLS Gen2 数据存储配置。提供数据存储名称、存储帐户名称和文件系统。

参考： azureDataLakeGen2.schema.json

在 CLI 中创建机器学习数据存储：

az ml datastore create --file my_adls_datastore.yml

使用 YAML 文件在工作区中创建或更新 ADLS Gen2 数据存储。

参考：az ml datastore create

创建此 YAML 文件（更新值）：

# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen2.schema.json
name: adls_gen2_example
type: azure_data_lake_gen2
description: Datastore pointing to an Azure Data Lake Storage Gen2 instance.
account_name: mytestdatalakegen2
filesystem: my-gen2-container
credentials:
  tenant_id: bbbbcccc-1111-dddd-2222-eeee3333ffff
  client_id: 44445555-eeee-6666-ffff-7777aaaa8888
  client_secret: Cc3Dd~4Ee5.-Ff6Gg7Hh8Ii9Jj0Kk1_Ll2Mm3Nn4

定义使用服务主体进行访问的 ADLS Gen2 数据存储配置。

参考： azureDataLakeGen2.schema.json

在 CLI 中创建机器学习数据存储：

az ml datastore create --file my_adls_datastore.yml

使用 YAML 文件在工作区中创建或更新 ADLS Gen2 数据存储。

参考：`az ml datastore create`

创建 Azure 文件存储数据存储

from azure.ai.ml.entities import AzureFileDatastore
from azure.ai.ml.entities import AccountKeyConfiguration
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient.from_config(credential=DefaultAzureCredential())

store = AzureFileDatastore(
    name="file_example",
    description="Datastore pointing to an Azure File Share.",
    account_name="mytestfilestore",
    file_share_name="my-share",
    credentials=AccountKeyConfiguration(
        account_key= "aaaaaaaa-0b0b-1c1c-2d2d-333333333333"
    ),
)

ml_client.create_or_update(store)

使用帐户密钥创建或更新指向指定 Azure 文件共享的数据存储。

参考： AzureFileDatastore，AccountKeyConfiguration，MLClient

from azure.ai.ml.entities import AzureFileDatastore
from azure.ai.ml.entities import SasTokenConfiguration
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient.from_config(credential=DefaultAzureCredential())

store = AzureFileDatastore(
    name="file_sas_example",
    description="Datastore pointing to an Azure File Share using a SAS token.",
    account_name="mytestfilestore",
    file_share_name="my-share",
    credentials=SasTokenConfiguration(
        sas_token="?xx=A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u&xx=C2dE3fH4iJ5kL6mN7oP8qR9sT0uV1wx&xx=Ff6Gg~7Hh8.-Ii9Jj0Kk1Ll2Mm3Nn4_Oo5Pp6Qq7&xx=N7oP8qR9sT0uV1wX2yZ3aB4cD5eF6g&xxx=Ee5Ff~6Gg7.-Hh8Ii9Jj0Kk1Ll2Mm3_Nn4Oo5Pp6&xxx=C2dE3fH4iJ5kL6mN7oP8qR9sT0uV1w"
    ),
)

ml_client.create_or_update(store)

使用 SAS 令牌来创建或更新数据存储，该数据存储指向指定的 Azure 文件共享。

参考： AzureFileDatastore，SasTokenConfiguration，MLClient

创建此 YAML 文件（更新值）：

# my_files_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureFile.schema.json
name: file_example
type: azure_file
description: Datastore pointing to an Azure File Share.
account_name: mytestfilestore
file_share_name: my-share
credentials:
  account_key: aaaaaaaa-0b0b-1c1c-2d2d-333333333333

定义使用帐户密钥进行访问的 Azure 文件存储数据存储配置。

参考： azureFile.schema.json

在 CLI 中创建机器学习数据存储：

az ml datastore create --file my_files_datastore.yml

使用 YAML 文件在工作区中创建或更新 Azure 文件数据存储。

参考：az ml datastore create

创建此 YAML 文件（更新值）：

# my_files_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureFile.schema.json
name: file_sas_example
type: azure_file
description: Datastore pointing to an Azure File Share using a SAS token.
account_name: mytestfilestore
file_share_name: my-share
credentials:
  sas_token: "?xx=A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u&xx=C2dE3fH4iJ5kL6mN7oP8qR9sT0uV1wx&xx=Ff6Gg~7Hh8.-Ii9Jj0Kk1Ll2Mm3Nn4_Oo5Pp6Qq7&xx=N7oP8qR9sT0uV1wX2yZ3aB4cD5eF6g&xxx=Ee5Ff~6Gg7.-Hh8Ii9Jj0Kk1Ll2Mm3_Nn4Oo5Pp6&xxx=C2dE3fH4iJ5kL6mN7oP8qR9sT0uV1w"

定义使用 SAS 令牌进行访问的 Azure 文件存储数据存储配置。

参考： azureFile.schema.json

在 CLI 中创建机器学习数据存储：

az ml datastore create --file my_files_datastore.yml

使用 YAML 文件在工作区中创建或更新 Azure 文件数据存储。

参考：`az ml datastore create`

创建 OneLake 数据存储

from azure.ai.ml.entities import OneLakeDatastore, OneLakeArtifact
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient.from_config(credential=DefaultAzureCredential())

store = OneLakeDatastore(
    name="onelake_example_id",
    description="Datastore pointing to a Microsoft Fabric artifact.",
    one_lake_workspace_name="bbbbbbbb-7777-8888-9999-cccccccccccc", #{your_one_lake_workspace_guid}
    endpoint="msit-onelake.dfs.fabric.microsoft.com", #{your_one_lake_endpoint}
    artifact=OneLakeArtifact(
        name="cccccccc-8888-9999-0000-dddddddddddd/Files", #{your_one_lake_artifact_guid}/Files
        type="lake_house"
    ),
)

ml_client.create_or_update(store)

通过基于身份的访问权限，创建或更新指向指定 Lakehouse 的 OneLake 数据存储。

参考： OneLakeDatastore，OneLakeArtifact，MLClient

from azure.ai.ml.entities import OneLakeDatastore, OneLakeArtifact
from azure.ai.ml.entities._datastore.credentials import ServicePrincipalCredentials
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient.from_config(credential=DefaultAzureCredential())

store = OneLakeDatastore(
    name="onelake_example_sp",
    description="Datastore pointing to a Microsoft Fabric artifact.",
    one_lake_workspace_name="bbbbbbbb-7777-8888-9999-cccccccccccc", #{your_one_lake_workspace_guid}
    endpoint="msit-onelake.dfs.fabric.microsoft.com", #{your_one_lake_endpoint}
    artifact=OneLakeArtifact(
        name="cccccccc-8888-9999-0000-dddddddddddd/Files", #{your_one_lake_artifact_guid}/Files
        type="lake_house"
    ),
    credentials=ServicePrincipalCredentials(
        tenant_id= "bbbbcccc-1111-dddd-2222-eeee3333ffff",
        client_id= "44445555-eeee-6666-ffff-7777aaaa8888",
        client_secret= "Cc3Dd~4Ee5.-Ff6Gg7Hh8Ii9Jj0Kk1_Ll2Mm3Nn4",
    ),
)

ml_client.create_or_update(store)

使用服务主体创建或更新一个指向指定湖屋的 OneLake 数据存储。

参考： OneLakeDatastore，OneLakeArtifact，MLClient

创建以下 YAML 文件。更新值：

# my_onelake_datastore.yml
$schema: http://azureml/sdk-2-0/OneLakeDatastore.json
name: onelake_example_id
type: one_lake
description: Datastore pointing to a OneLake lakehouse.
one_lake_workspace_name: "eeeeffff-4444-aaaa-5555-bbbb6666cccc"
endpoint: "msit-onelake.dfs.fabric.microsoft.com"
artifact:
  type: lake_house
  name: "1111bbbb-22cc-dddd-ee33-ffffff444444/Files"

定义 YAML 文件中的 OneLake 数据存储配置。提供工作区、终结点和项目值。

在 CLI 中创建机器学习数据存储：

az ml datastore create --file my_onelake_datastore.yml

使用 YAML 文件在工作区中创建或更新 OneLake 数据存储。

参考：az ml datastore create

创建以下 YAML 文件。更新值：

# my_onelakesp_datastore.yml
$schema: http://azureml/sdk-2-0/OneLakeDatastore.json
name: onelake_example_id
type: one_lake
description: Datastore pointing to a OneLake lakehouse.
one_lake_workspace_name: "eeeeffff-4444-aaaa-5555-bbbb6666cccc"
endpoint: "msit-onelake.dfs.fabric.microsoft.com"
artifact:
  type: lake_house
  name: "1111bbbb-22cc-dddd-ee33-ffffff444444/Files"
credentials:
  tenant_id: bbbbcccc-1111-dddd-2222-eeee3333ffff
  client_id: 44445555-eeee-6666-ffff-7777aaaa8888
  client_secret: Cc3Dd~4Ee5.-Ff6Gg7Hh8Ii9Jj0Kk1_Ll2Mm3Nn4

定义使用服务主体进行访问的 OneLake 数据存储配置。

在 CLI 中创建机器学习数据存储：

az ml datastore create --file my_onelakesp_datastore.yml

使用 YAML 文件在工作区中创建或更新 OneLake 数据存储。

参考：`az ml datastore create`

Troubleshooting

問题	原因	决议
创建数据存储时出现 403 或 AuthorizationFailed	工作区或存储帐户角色分配缺失	验证是否具有所需的工作区和存储角色，然后重试该命令。
身份验证失败 `DefaultAzureCredential`	找不到有效的凭据源	运行 `az login`或配置服务主体的环境变量。
使用基于标识的访问时拒绝存储访问	存储帐户缺少与您的标识相关的数据平面权限	将正确的存储数据角色分配给标识，然后重试。

后续步骤

Last updated on 2026-02-05

通过

创建数据存储

先决条件

验证访问权限 （Python）

创建 Azure Blob 数据存储

其他资源

验证访问权限（Python）