提高 Azure 机器学习的复原能力Increase Azure Machine Learning resiliency
本文介绍如何通过使用高可用性配置提高 Azure 机器学习资源的复原能力。In this article, you'll learn how to make your Microsoft Azure Machine Learning resources more resilient by using high-availability configurations. 可以配置 Azure 机器学习所依赖的 Azure 服务以实现高可用性。You can configure the Azure services that Azure Machine Learning depends on for high availability. 本文介绍了可以配置哪些服务来实现高可用性,并列出了多个链接来提供有关如何配置这些资源的其他信息。This article identifies the services you can configure for high availability, and links to additional information on configuring these resources.
备注
Azure 机器学习本身不提供灾难恢复选项。Azure Machine Learning itself does not offer a disaster recovery option.
了解适用于 Azure 机器学习的 Azure 服务Understand Azure services for Azure Machine Learning
Azure 机器学习依赖于多个 Azure 服务,并具有多个层。Azure Machine Learning depends on multiple Azure services and has several layers. 其中一些服务已在(客户)订阅中预配。Some of these services are provisioned in your (customer) subscription. 你负责这些服务的高可用性配置。You're responsible for the high-availability configuration of these services. 其他服务在 Microsoft 订阅中创建,并由 Microsoft 管理。Other services are created in a Microsoft subscription and managed by Microsoft.
Azure 服务包括:Azure services include:
Azure 机器学习基础结构:适用于 Azure 机器学习工作区的 Microsoft 托管环境。Azure Machine Learning infrastructure: A Microsoft-managed environment for the Azure Machine Learning workspace.
关联资源:Azure 机器学习工作区创建期间在订阅中预配的资源。Associated resources: Resources provisioned in your subscription during Azure Machine Learning workspace creation. 这些资源包括 Azure 存储、Azure Key Vault、Azure 容器注册表和 Application Insights。These resources include Azure Storage, Azure Key Vault, Azure Container Registry, and Application Insights. 你负责配置这些资源的高可用性设置。You're responsible for configuring high-availability settings for these resources.
- 默认存储具有模型、训练日志数据和数据集等数据。Default storage has data such as model, training log data, and dataset.
- Key Vault 具有 Azure 存储、容器注册表和数据存储的凭据。Key Vault has credentials for Azure Storage, Container Registry, and data stores.
- 容器注册表具有用于训练和推理环境的 Docker 映像。Container Registry has a Docker image for training and inferencing environments.
- Application Insights 用于监视 Azure 机器学习。Application Insights is for monitoring Azure Machine Learning.
计算资源:在部署工作区之后创建的资源。Compute resources: Resources you create after workspace deployment. 例如,可能会创建一个计算实例或计算群集来训练机器学习模型。For example, you might create a compute instance or compute cluster to train a Machine Learning model.
- 计算实例和计算群集:Microsoft 托管模型开发环境。Compute instance and compute cluster: Microsoft-managed model development environments.
- 其他资源:可附加到 Azure 机器学习的 Microsoft 计算资源,例如 Azure Kubernetes 服务 (AKS)、Azure Databricks、Azure 容器实例和 Azure HDInsight。Other resources: Microsoft computing resources that you can attach to Azure Machine Learning, such as Azure Kubernetes Service (AKS), Azure Databricks, Azure Container Instances, and Azure HDInsight. 你负责配置这些资源的高可用性设置。You're responsible for configuring high-availability settings for these resources.
其他数据存储:Azure 机器学习可以装载其他数据存储(例如 Azure 存储、Azure Data Lake Storage 和 Azure SQL 数据库)用于训练数据。Additional data stores: Azure Machine Learning can mount additional data stores such as Azure Storage, Azure Data Lake Storage, and Azure SQL Database for training data. 这些数据存储已在订阅中预配。These data stores are provisioned within your subscription. 你负责配置它们的高可用性设置。You're responsible for configuring their high-availability settings.
下表显示由 Microsoft 管理的 Azure 服务、由你管理的服务以及默认具有高可用性的服务。The following table shows which Azure services are managed by Microsoft, which are managed by you, and which are highly available by default.
服务Service | 管理者Managed by | 默认具有高可用性High availability by default |
---|---|---|
Azure 机器学习基础结构Azure Machine Learning infrastructure | MicrosoftMicrosoft | |
关联资源Associated resources | ||
Azure 存储Azure Storage | 你You | |
Key VaultKey Vault | 你You | ✓✓ |
容器注册表Container Registry | 你You | |
Application InsightsApplication Insights | 你You | NANA |
计算资源Compute resources | ||
计算实例Compute instance | MicrosoftMicrosoft | |
计算群集Compute cluster | MicrosoftMicrosoft | |
其他计算资源,例如 AKS、Other compute resources such as AKS, Azure Databricks、容器实例、HDInsightAzure Databricks, Container Instances, HDInsight |
你You | |
其他数据存储,例如 Azure 存储、Azure SQL 数据库、Additional data stores such as Azure Storage, SQL Database, Azure Database for PostgreSQL、Azure Database for MySQL、Azure Database for PostgreSQL, Azure Database for MySQL, Azure Databricks 文件系统Azure Databricks File System |
你You |
本文的其余部分介绍如何使这些服务具有高可用性。The rest of this article describes the actions you need to take to make each of these services highly available.
关联资源Associated Resources
重要
Azure 机器学习不支持使用异地冗余存储 (GRS)、异地区域冗余存储 (GZRS)、读取访问异地冗余存储 (RA-GRS) 或读取访问异地区域冗余存储 (RA-GZRS) 的默认存储帐户故障转移。Azure Machine Learning does not support default storage-account failover using geo-redundant storage (GRS), geo-zone-redundant storage (GZRS), read-access geo-redundant storage (RA-GRS), or read-access geo-zone-redundant storage (RA-GZRS).
请参阅以下文档,确保配置每个资源的高可用性设置:Make sure to configure the high-availability settings of each resource by referring to the following documentation:
- Azure 存储:若要配置高可用性设置,请参阅 Azure 存储冗余。Azure Storage: To configure high-availability settings, see Azure Storage redundancy.
- Key Vault:Key Vault 默认提供高可用性,无需用户操作。Key Vault: Key Vault provides high availability by default and requires no user action. 请参阅 Azure 密钥保管库可用性和冗余。See Azure Key Vault availability and redundancy.
- 容器注册表:为异地复制选择高级注册表选项。Container Registry: Choose the Premium registry option for geo-replication. 请参阅 Azure 容器注册表中的异地复制。See Geo-replication in Azure Container Registry.
- Application Insights:Application Insights 不提供高可用性设置。Application Insights: Application Insights doesn't provide high-availability settings. 若要调整数据保留期和详细信息,请参阅 Application Insights 中的数据收集、保留和存储。To adjust the data-retention period and details, see Data collection, retention, and storage in Application Insights.
计算资源Compute resources
请参阅以下文档,确保配置每个资源的高可用性设置:Make sure to configure the high-availability settings of each resource by referring to the following documentation:
- Azure Kubernetes 服务:请参阅 Azure Kubernetes 服务 (AKS) 中实现业务连续性和灾难恢复的最佳做法和 创建使用可用性区域的 Azure Kubernetes 服务 (AKS) 群集。Azure Kubernetes Service: See Best practices for business continuity and disaster recovery in Azure Kubernetes Service (AKS) and Create an Azure Kubernetes Service (AKS) cluster that uses availability zones. 如果 AKS 群集是使用 Azure 机器学习工作室、SDK 或 CLI 创建的,则不支持跨区域高可用性。If the AKS cluster was created by using the Azure Machine Learning Studio, SDK, or CLI, cross-region high availability is not supported.
- 容器实例:业务流程协调程序负责故障转移。Container Instances: An orchestrator is responsible for failover. 请参阅 Azure 容器实例和容器业务流程协调程序。See Azure Container Instances and container orchestrators.
- HDInsight:请参阅 Azure HDInsight 支持的高可用性服务。HDInsight: See High availability services supported by Azure HDInsight.
其他数据存储Additional data stores
请参阅以下文档,确保配置每个资源的高可用性设置:Make sure to configure the high-availability settings of each resource by referring to the following documentation:
- Azure Blob 容器/Azure 文件存储/Azure Data Lake Gen2:与默认存储相同。Azure Blob container / Azure Files / Data Lake Storage Gen2: Same as default storage.
- SQL 数据库:请参阅 Azure SQL 数据库和 SQL 托管实例的高可用性。SQL Database: See High availability for Azure SQL Database and SQL Managed Instance.
- Azure Database for PostgreSQL:请参阅 Azure Database for PostgreSQL - 单一服务器中的高可用性概念。Azure Database for PostgreSQL: See High availability concepts in Azure Database for PostgreSQL - Single Server.
- Azure Database for MySQL:请参阅 了解 Azure Database for MySQL 中的业务连续性。Azure Database for MySQL: See Understand business continuity in Azure Database for MySQL.
Azure Cosmos DBAzure Cosmos DB
如果你提供自己的客户管理的密钥来部署 Azure 机器学习工作区,则还会在订阅中预配 Azure Cosmos DB。If you provide your own customer-managed key to deploy an Azure Machine Learning workspace, Azure Cosmos DB is also provisioned within your subscription. 在这种情况下,你应负责配置其高可用性设置。In that case, you're responsible for configuring its high-availability settings. 请参阅使用 Azure Cosmos DB 实现高可用性。See High availability with Azure Cosmos DB.
后续步骤Next steps
若要使用具有高可用性设置的关联资源部署 Azure 机器学习,请使用 Azure 资源管理器模板。To deploy Azure Machine Learning with associated resources with your high availability Settings, use an Azure Resource Manager template.