Azure Database for MariaDB 中的高可用性High availability in Azure Database for MariaDB

Azure Database for MariaDB 服务提供有保证的高级别可用性,即,提供正常运行时间占比为 99.99% 且具有财务支持的服务级别协议 (SLA)。The Azure Database for MariaDB service provides a guaranteed high level of availability with the financially backed service level agreement (SLA) of 99.99% uptime. Azure Database for MariaDB 在发生计划内事件(例如用户发起的缩放计算操作)期间提供高可用性,并且还在发生基础硬件、软件或网络故障等计划外事件时提供高可用性。Azure Database for MariaDB provides high availability during planned events such as user-initated scale compute operation, and also when unplanned events such as underlying hardware, software, or network failures occur. Azure Database for MariaDB 在发生大多数严重状况时都可以快速恢复,确保用户在使用此服务时应用程序几乎不会故障。Azure Database for MariaDB can quickly recover from most critical circumstances, ensuring virtually no application down time when using this service.

Azure Database for MariaDB 适合运行对正常运行时间要求很高的关键数据库。Azure Database for MariaDB is suitable for running mission critical databases that require high uptime. 该服务基于 Azure 体系结构构建,具有固有的高可用性、冗余性和复原能力,可以缓解计划内和计划外中断造成的数据库停机,不需要你配置任何其他组件。Built on Azure architecture, the service has inherent high availability, redundancy, and resiliency capabilities to mitigate database downtime from planned and unplanned outages, without requiring you to configure any additional components.

Azure Database for MariaDB 中的审核日志Components in Azure Database for MariaDB

组件Component 说明Description
MariaDB 数据库服务器MariaDB Database Server Azure Database for MariaDB 为数据库服务器提供安全性、隔离、资源保护和快速重启功能。Azure Database for MariaDB provides security, isolation, resource safeguards, and fast restart capability for database servers. 这些功能有助于在发生中断后的几秒钟内执行缩放操作和数据库服务器恢复操作等操作。These capabilities facilitate operations such as scaling and database server recovery operation after an outage to happen in seconds.
数据库服务器中的数据修改通常发生在数据库事务的上下文中。Data modifications in the database server typically occur in the context of a database transaction. 所有数据库更改都以预写日志 (ib_log) 的形式同步记录在 Azure 存储上,该存储附加到数据库服务器。All database changes are recorded synchronously in the form of write ahead logs (ib_log) on Azure Storage – which is attached to the database server. 在数据库检查点过程中,数据库服务器内存中的数据页也会刷新到存储中。During the database checkpoint process, data pages from the database server memory are also flushed to the storage.
远程存储Remote Storage 所有 MariaDB 物理数据文件和日志文件都存储在 Azure 存储中,该存储设计为在一个区域中存储数据的三个副本,以确保数据冗余、可用性和可靠性。All MariaDB physical data files and log files are stored on Azure Storage, which is architected to store three copies of data within a region to ensure data redundancy, availability, and reliability. 存储层还独立于数据库服务器。The storage layer is also independent of the database server. 它可以在几秒内从发生故障的数据库服务器分离并重新附加到新的数据库服务器。It can be detached from a failed database server and reattached to a new database server within few seconds. 此外,Azure 存储还会持续监视是否存在任何存储故障。Also, Azure Storage continuously monitors for any storage faults. 如果检测到块损坏,则会通过实例化新的存储副本来自动修复。If a block corruption is detected, it is automatically fixed by instantiating a new storage copy.
网关Gateway 网关充当数据库代理,将所有客户端连接路由到数据库服务器。The Gateway acts as a database proxy, routes all client connections to the database server.

缓解计划外停机Unplanned downtime mitigation

意外的故障(包括基础硬件故障、网络问题和软件 bug)可能会导致计划外停机。Unplanned downtime can occur as a result of unforeseen failures, including underlying hardware fault, networking issues, and software bugs. 如果数据库服务器意外关闭,则会在数秒内自动预配一个新的数据库服务器。If the database server goes down unexpectedly, a new database server is automatically provisioned in seconds. 远程存储会自动附加到新的数据库服务器。The remote storage is automatically attached to the new database server. MariaDB 引擎使用 WAL 和数据库文件执行恢复操作,并打开数据库服务器以允许客户端进行连接。MariaDB engine performs the recovery operation using WAL and database files, and opens up the database server to allow clients to connect. 未提交的事务将丢失,并且必须由应用程序重试。Uncommitted transactions are lost, and they have to be retried by the application. 虽然计划外停机无法避免,但 Azure Database for MariaDB 可以通过在数据库服务器和存储层上自动执行恢复操作来减少故障时间,无需人工干预。While an unplanned downtime cannot be avoided, Azure Database for MariaDB mitigates the downtime by automatically performing recovery operations at both database server and storage layers without requiring human intervention.

Azure MariaDB 中的高可用性的视图

计划外停机:故障场景和服务恢复Unplanned downtime: failure scenarios and service recovery

下面介绍了一些故障场景以及 Azure Database for MariaDB 如何自动恢复:Here are some failure scenarios and how Azure Database for MariaDB automatically recovers:

方案Scenario 自动恢复Automatic recovery
数据库服务器故障Database server failure 如果数据库服务器由于某些基础硬件故障而关闭,则会丢弃处于活动状态的连接,并中止任何正在进行的事务。If the database server is down because of some underlying hardware fault, active connections are dropped, and any inflight transactions are aborted. 将自动部署新的数据库服务器,并将远程数据存储附加到新的数据库服务器。A new database server is automatically deployed, and the remote data storage is attached to the new database server. 在数据库恢复完成后,客户端可以通过网关连接到新的数据库服务器。After the database recovery is complete, clients can connect to the new database server through the Gateway.

所构建的使用 MariaDB 数据库的应用程序需要能够检测并重试断开的连接和失败的事务。Applications using the MariaDB databases need to be built in a way that they detect and retry dropped connections and failed transactions. 当应用程序重试时,网关会将连接透明地重定向到新创建的数据库服务器。When the application retries, the Gateway transparently redirects the connection to the newly created database server.
存储故障Storage failure 对于任何与存储相关的问题(例如磁盘故障或物理块损坏),应用程序看不到任何影响。Applications do not see any impact for any storage-related issues such as a disk failure or a physical block corruption. 由于数据存储在 3 个副本中,因此将由未发生故障的存储提供数据的副本。As the data is stored in 3 copies, the copy of the data is served by the surviving storage. 块损坏会自动修复。Block corruptions are automatically corrected. 如果丢失了数据的副本,则会自动创建数据的新副本。If a copy of data is lost, a new copy of the data is automatically created.

下面是需要用户执行操作来进行恢复的一些故障场景:Here are some failure scenarios that require user action to recover:

方案Scenario 恢复计划Recovery plan
区域故障 Region failure 区域故障非常少见。Failure of a region is a rare event. 但是,如果需要在发生区域故障时获得保护,则可在其他区域中配置一个或多个用于灾难恢复 (DR) 的只读副本。However, if you need protection from a region failure, you can configure one or more read replicas in other regions for disaster recovery (DR). (请参阅此文,详细了解如何创建和管理只读副本)。(See this article about creating and managing read replicas for details). 如果出现区域级故障,可以手动将其他区域上配置的只读副本提升为生产数据库服务器。In the event of a region-level failure, you can manually promote the read replica configured on the other region to be your production database server.
逻辑/用户错误 Logical/user errors 在发生用户错误(例如,意外删除了表或错误地更新了数据)后进行的恢复涉及到执行时间点恢复 (PITR),方法是将数据还原并恢复到发生错误之前的那个时间点。Recovery from user errors, such as accidentally dropped tables or incorrectly updated data, involves performing a point-in-time recovery (PITR), by restoring and recovering the data until the time just before the error had occurred.

如果只需还原部分数据库或特定的表,而不是还原数据库服务器中的所有数据库,则可在新实例中还原数据库服务器,通过 mysqldump 导出表,然后使用 restore 将这些表还原到数据库中。If you want to restore only a subset of databases or specific tables rather than all databases in the database server, you can restore the database server in a new instance, export the table(s) via mysqldump, and then use restore to restore those tables into your database.

摘要Summary

Azure Database for MariaDB 提供了数据库服务器快速重启功能、冗余存储和网关的高效路由。Azure Database for MariaDB provides fast restart capability of database servers, redundant storage, and efficient routing from the Gateway. 为了进一步进行数据保护,你可以将备份配置为异地复制的备份,同时在其他区域中部署一个或多个只读副本。For additional data protection, you can configure backups to be geo-replicated, and also deploy one or more read replicas in other regions. 利用固有的高可用性功能,Azure Database for MariaDB 保护数据库免受最常见的服务中断影响,并提供行业领先且具有财务支持的正常运行时间占比为 99.99% 的 SLAWith inherent high availability capabilities, Azure Database for MariaDB protects your databases from most common outages, and offers an industry leading, finance-backed 99.99% of uptime SLA. 所有这些可用性和可靠性功能使得 Azure 成为运行关键应用程序的理想平台。All these availability and reliability capabilities enable Azure to be the ideal platform to run your mission-critical applications.

后续步骤Next steps