提高 Azure 机器学习的复原能力Increase Azure Machine Learning resiliency

本文介绍如何通过使用高可用性配置提高 Azure 机器学习资源的复原能力。In this article, you'll learn how to make your Microsoft Azure Machine Learning resources more resilient by using high-availability configurations. 可以配置 Azure 机器学习所依赖的 Azure 服务以实现高可用性。You can configure the Azure services that Azure Machine Learning depends on for high availability. 本文介绍了可以配置哪些服务来实现高可用性,并列出了多个链接来提供有关如何配置这些资源的其他信息。This article identifies the services you can configure for high availability, and links to additional information on configuring these resources.

备注

Azure 机器学习本身不提供灾难恢复选项。Azure Machine Learning itself does not offer a disaster recovery option.

了解适用于 Azure 机器学习的 Azure 服务Understand Azure services for Azure Machine Learning

Azure 机器学习依赖于多个 Azure 服务,并具有多个层。Azure Machine Learning depends on multiple Azure services and has several layers. 其中一些服务已在(客户)订阅中预配。Some of these services are provisioned in your (customer) subscription. 你负责这些服务的高可用性配置。You're responsible for the high-availability configuration of these services. 其他服务在 Microsoft 订阅中创建,并由 Microsoft 管理。Other services are created in a Microsoft subscription and managed by Microsoft.

Azure 服务包括:Azure services include:

  • Azure 机器学习基础结构:适用于 Azure 机器学习工作区的 Microsoft 托管环境。Azure Machine Learning infrastructure: A Microsoft-managed environment for the Azure Machine Learning workspace.

  • 关联资源:Azure 机器学习工作区创建期间在订阅中预配的资源。Associated resources: Resources provisioned in your subscription during Azure Machine Learning workspace creation. 这些资源包括 Azure 存储、Azure Key Vault、Azure 容器注册表和 Application Insights。These resources include Azure Storage, Azure Key Vault, Azure Container Registry, and Application Insights. 你负责配置这些资源的高可用性设置。You're responsible for configuring high-availability settings for these resources.

    • 默认存储具有模型、训练日志数据和数据集等数据。Default storage has data such as model, training log data, and dataset.
    • Key Vault 具有 Azure 存储、容器注册表和数据存储的凭据。Key Vault has credentials for Azure Storage, Container Registry, and data stores.
    • 容器注册表具有用于训练和推理环境的 Docker 映像。Container Registry has a Docker image for training and inferencing environments.
    • Application Insights 用于监视 Azure 机器学习。Application Insights is for monitoring Azure Machine Learning.
  • 计算资源:在部署工作区之后创建的资源。Compute resources: Resources you create after workspace deployment. 例如,可能会创建一个计算实例或计算群集来训练机器学习模型。For example, you might create a compute instance or compute cluster to train a Machine Learning model.

    • 计算实例和计算群集:Microsoft 托管模型开发环境。Compute instance and compute cluster: Microsoft-managed model development environments.
    • 其他资源:可附加到 Azure 机器学习的 Microsoft 计算资源,例如 Azure Kubernetes 服务 (AKS)、Azure Databricks、Azure 容器实例和 Azure HDInsight。Other resources: Microsoft computing resources that you can attach to Azure Machine Learning, such as Azure Kubernetes Service (AKS), Azure Databricks, Azure Container Instances, and Azure HDInsight. 你负责配置这些资源的高可用性设置。You're responsible for configuring high-availability settings for these resources.
  • 其他数据存储:Azure 机器学习可以装载其他数据存储(例如 Azure 存储、Azure Data Lake Storage 和 Azure SQL 数据库)用于训练数据。Additional data stores: Azure Machine Learning can mount additional data stores such as Azure Storage, Azure Data Lake Storage, and Azure SQL Database for training data. 这些数据存储已在订阅中预配。These data stores are provisioned within your subscription. 你负责配置它们的高可用性设置。You're responsible for configuring their high-availability settings.

下表显示由 Microsoft 管理的 Azure 服务、由你管理的服务以及默认具有高可用性的服务。The following table shows which Azure services are managed by Microsoft, which are managed by you, and which are highly available by default.

服务Service 管理者Managed by 默认具有高可用性High availability by default
Azure 机器学习基础结构Azure Machine Learning infrastructure MicrosoftMicrosoft
关联资源Associated resources
Azure 存储Azure Storage You
Key VaultKey Vault You
容器注册表Container Registry You
Application InsightsApplication Insights You NANA
计算资源Compute resources
计算实例Compute instance MicrosoftMicrosoft
计算群集Compute cluster MicrosoftMicrosoft
其他计算资源,例如 AKS、Other compute resources such as AKS,
Azure Databricks、容器实例、HDInsightAzure Databricks, Container Instances, HDInsight
You
其他数据存储,例如 Azure 存储、Azure SQL 数据库、Additional data stores such as Azure Storage, SQL Database,
Azure Database for PostgreSQL、Azure Database for MySQL、Azure Database for PostgreSQL, Azure Database for MySQL,
Azure Databricks 文件系统Azure Databricks File System
You

本文的其余部分介绍如何使这些服务具有高可用性。The rest of this article describes the actions you need to take to make each of these services highly available.

关联资源Associated Resources

重要

Azure 机器学习不支持使用异地冗余存储 (GRS)、异地区域冗余存储 (GZRS)、读取访问异地冗余存储 (RA-GRS) 或读取访问异地区域冗余存储 (RA-GZRS) 的默认存储帐户故障转移。Azure Machine Learning does not support default storage-account failover using geo-redundant storage (GRS), geo-zone-redundant storage (GZRS), read-access geo-redundant storage (RA-GRS), or read-access geo-zone-redundant storage (RA-GZRS).

请参阅以下文档,确保配置每个资源的高可用性设置:Make sure to configure the high-availability settings of each resource by referring to the following documentation:

计算资源Compute resources

请参阅以下文档,确保配置每个资源的高可用性设置:Make sure to configure the high-availability settings of each resource by referring to the following documentation:

其他数据存储Additional data stores

请参阅以下文档,确保配置每个资源的高可用性设置:Make sure to configure the high-availability settings of each resource by referring to the following documentation:

Azure Cosmos DBAzure Cosmos DB

如果你提供自己的客户管理的密钥来部署 Azure 机器学习工作区,则还会在订阅中预配 Azure Cosmos DB。If you provide your own customer-managed key to deploy an Azure Machine Learning workspace, Azure Cosmos DB is also provisioned within your subscription. 在这种情况下,你应负责配置其高可用性设置。In that case, you're responsible for configuring its high-availability settings. 请参阅使用 Azure Cosmos DB 实现高可用性See High availability with Azure Cosmos DB.

后续步骤Next steps

若要使用具有高可用性设置的关联资源部署 Azure 机器学习,请使用 Azure 资源管理器模板To deploy Azure Machine Learning with associated resources with your high availability Settings, use an Azure Resource Manager template.