将数据存储管理升级到 SDK v2

Azure 机器学习数据存储将连接信息安全地存储在 Azure 上的数据存储中,因此无需在脚本中对其进行编码。 与 V1 相比,V2 数据存储概念大部分保持不变。 区别在于,我们不会通过 AzureML 数据存储支持类似 SQL 的数据源。 我们将通过 AzureML 数据导入和导出功能支持类似 SQL 的数据源。

本文比较 SDK v1 和 SDK v2 中的方案。

通过 account_key 从 Azure Blob 容器创建数据存储

  • SDK v1

    blob_datastore_name='azblobsdk' # Name of the datastore to workspace
    container_name=os.getenv("BLOB_CONTAINER", "<my-container-name>") # Name of Azure blob container
    account_name=os.getenv("BLOB_ACCOUNTNAME", "<my-account-name>") # Storage account name
    account_key=os.getenv("BLOB_ACCOUNT_KEY", "<my-account-key>") # Storage account access key
    
    blob_datastore = Datastore.register_azure_blob_container(workspace=ws, 
                                                             datastore_name=blob_datastore_name, 
                                                             container_name=container_name, 
                                                             account_name=account_name,
                                                             account_key=account_key)
    
  • SDK v2

    from azure.ai.ml.entities import AzureBlobDatastore
    from azure.ai.ml import MLClient
    
    ml_client = MLClient.from_config()
    
    store = AzureBlobDatastore(
        name="blob-protocol-example",
        description="Datastore pointing to a blob container using wasbs protocol.",
        account_name="mytestblobstore",
        container_name="data-container",
        protocol="wasbs",
        credentials={
            "account_key": "XXXxxxXXXxXXXXxxXXXXXxXXXXXxXxxXxXXXxXXXxXXxxxXXxxXXXxXxXXXxxXxxXXXXxxxxxXXxxxxxxXXXxXXX"
        },
    )
    
    ml_client.create_or_update(store)
    

通过 sas_token 从 Azure Blob 容器创建数据存储

  • SDK v1

    blob_datastore_name='azblobsdk' # Name of the datastore to workspace
    container_name=os.getenv("BLOB_CONTAINER", "<my-container-name>") # Name of Azure blob container
    sas_token=os.getenv("BLOB_SAS_TOKEN", "<my-sas-token>") # Sas token
    
    blob_datastore = Datastore.register_azure_blob_container(workspace=ws, 
                                                             datastore_name=blob_datastore_name, 
                                                             container_name=container_name, 
                                                             sas_token=sas_token)
    
  • SDK v2

    from azure.ai.ml.entities import AzureBlobDatastore
    from azure.ai.ml import MLClient
    
    ml_client = MLClient.from_config()
    
    store = AzureBlobDatastore(
        name="blob-sas-example",
        description="Datastore pointing to a blob container using SAS token.",
        account_name="mytestblobstore",
        container_name="data-container",
        credentials=SasTokenCredentials(
            sas_token= "?xx=XXXX-XX-XX&xx=xxxx&xxx=xxx&xx=xxxxxxxxxxx&xx=XXXX-XX-XXXXX:XX:XXX&xx=XXXX-XX-XXXXX:XX:XXX&xxx=xxxxx&xxx=XXxXXXxxxxxXXXXXXXxXxxxXXXXXxxXXXXXxXXXXxXXXxXXxXX"
        ),
    )
    
    ml_client.create_or_update(store)
    

通过基于标识的身份验证从 Azure Blob 容器创建数据存储

  • SDK v1
blob_datastore = Datastore.register_azure_blob_container(workspace=ws,
                                                      datastore_name='credentialless_blob',
                                                      container_name='my_container_name',
                                                      account_name='my_account_name')

  • SDK v2

    from azure.ai.ml.entities import AzureBlobDatastore
    from azure.ai.ml import MLClient
    
    ml_client = MLClient.from_config()
    
    store = AzureBlobDatastore(
        name="",
        description="",
        account_name="",
        container_name=""
    )
    
    ml_client.create_or_update(store)
    

从工作区获取数据存储

  • SDK v1

    # Get a named datastore from the current workspace
    datastore = Datastore.get(ws, datastore_name='your datastore name')
    
    # List all datastores registered in the current workspace
    datastores = ws.datastores
    for name, datastore in datastores.items():
        print(name, datastore.datastore_type)
    
  • SDK v2

    from azure.ai.ml import MLClient
    from azure.identity import DefaultAzureCredential
    
    #Enter details of your AzureML workspace
    subscription_id = '<SUBSCRIPTION_ID>'
    resource_group = '<RESOURCE_GROUP>'
    workspace_name = '<AZUREML_WORKSPACE_NAME>'
    
    ml_client = MLClient(credential=DefaultAzureCredential(),
                         subscription_id=subscription_id, 
                         resource_group_name=resource_group)
    
    datastore = ml_client.datastores.get(datastore_name='your datastore name')
    

SDK v1 和 SDK v2 中关键功能的映射

SDK v1 中的存储类型 SDK v2 中的存储类型
azureml_blob_datastore azureml_blob_datastore
azureml_data_lake_gen2_datastore azureml_data_lake_gen2_datastore
azuremlml_sql_database_datastore 将通过导入和导出功能获得支持
azuremlml_my_sql_datastore 将通过导入和导出功能获得支持
azuremlml_postgre_sql_datastore 将通过导入和导出功能获得支持

后续步骤

有关详细信息,请参阅: