规划在 Azure SQL 数据库和 Azure SQL 托管实例中的 Azure 维护事件Plan for Azure maintenance events in Azure SQL Database and Azure SQL Managed Instance

适用于: Azure SQL 数据库 Azure SQL 托管实例

了解如何为在 Azure SQL 数据库和 Azure SQL 托管实例中的数据库上执行计划内维护事件做准备。Learn how to prepare for planned maintenance events on your database in Azure SQL Database and Azure SQL Managed Instance.

什么是计划内维护事件?What is a planned maintenance event?

为了保持 Azure SQL 数据库和 Azure SQL 托管实例服务的安全、合规性、稳定性和高性能,将持续通过服务组件进行更新。To keep Azure SQL Database and Azure SQL Managed Instance services secure, compliant, stable, and performant, updates are being performed through the service components almost continuously. 得益于新式可靠的服务体系结构和创新技术(如热修补),大多数更新在服务可用性方面都是完全透明且不会产生不良影响的。Thanks to the modern and robust service architecture and innovative technologies like hot patching, majority of updates are fully transparent and non-impactful in terms of service availability. 尽管如此,少数类型的更新会导致短暂的服务中断并需要特殊处理。Still, few types of updates cause short service interrupts and require special treatment.

Azure SQL 数据库和 Azure SQL 托管实例为每个数据库维护了一组正常运营所需的最低数量的数据库副本,其中一个副本是主副本。For each database, Azure SQL Database and Azure SQL Managed Instance maintain a quorum of database replicas where one replica is the primary. 主副本在任何时间都必须处于联机运行提供服务的状态,且至少要有一个辅助副本处于正常可用的状态。At all times, a primary replica must be online servicing, and at least one secondary replica must be healthy. 在计划内维护期间,所维护的数据库副本将进入脱机状态,一次脱机一个,目的是要有一个能够响应的主副本和至少一个辅助副本处于联机状态,确保不发生客户端停机。During planned maintenance, members of the database quorum will go offline one at a time, with the intent that there is one responding primary replica and at least one secondary replica online to ensure no client downtime. 当主副本需要进入脱机状态时,将启动重新配置/故障转移进程,其间,会有一个辅助副本变为新的主副本。When the primary replica needs to be brought offline, a reconfiguration/failover process will occur in which one secondary replica will become the new primary.

计划内维护事件期间会发生什么What to expect during a planned maintenance event

维护事件可能产生单个或多个故障转移,具体取决于维护事件开始时主要副本和次要副本的集合。Maintenance event can produce single or multiple failovers, depending on the constellation of the primary and secondary replicas at the beginning of the maintenance event. 平均而言,每个计划内维护事件会出现 1.7 个故障转移。On average, 1.7 failovers occur per planned maintenance event. 重新配置/故障转移通常在 30 秒内完成。Reconfigurations/failovers generally finish within 30 seconds. 平均 8 秒。The average is 8 seconds. 如果应用程序处于已连接状态,则必须重新连接至新的数据库主要副本。If already connected, your application must reconnect to the new primary replica of your database. 如果在进行连接时数据库正在进行重新配置,且新的主副本尚未处于联机状态,会显示错误 40613(数据库不可用): “服务器 '{servername}' 上的数据库 '{databasename}' 当前不可用。请稍后重试连接”错误。If a new connection is attempted while the database is undergoing a reconfiguration before the new primary replica is online, you get error 40613 (Database Unavailable): "Database '{databasename}' on server '{servername}' is not currently available. Please retry the connection later." 如果数据库有一个长时间运行的查询,重新配置期间此查询会中断,需要重新启动。If your database has a long-running query, this query will be interrupted during a reconfiguration and will need to be restarted.

如何模拟计划内维护事件How to simulate a planned maintenance event

在部署到生产环境之前,确保客户端应用程序对于维护事件是可复原的,这有助于降低应用程序故障的风险,并可帮助提升最终用户的应用程序可用性。Ensuring that your client application is resilient to maintenance events prior to deploying to production will help mitigate the risk of application faults and will contribute to application availability for your end users. 可在计划内维护事件期间通过 PowerShell、CLI 或 REST API 启动手动故障转移,来测试客户端应用程序的行为。You can test behavior of your client application during planned maintenance events by initiating manual failover via PowerShell, CLI, or REST API. 这将生成与使主要副本脱机的维护事件相同的行为。It will produce identical behavior as maintenance event bringing primary replica offline.

重试逻辑Retry logic

连接到云数据库服务的任何客户端生产应用程序均应实现一个可靠的连接重试逻辑Any client production application that connects to a cloud database service should implement a robust connection retry logic. 这将有助于使故障转移对最终用户透明,或至少最大程度地减少负面影响。This will help make failovers transparent to the end users, or at least minimize negative effects.

资源运行状况Resource health

如果数据库发生登录失败的情况,请在 Azure 门户资源运行状况窗口中查看当前状态。If your database is experiencing log-on failures, check the Resource Health window in the Azure portal for the current status. 运行状况历史记录部分包含每个事件(如果有)的停机原因。The Health History section contains the downtime reason for each event (when available).

后续步骤Next steps