保护在 Azure Stack Hub 上部署的 VMProtect VMs deployed on Azure Stack Hub

使用本文作为指南,为部署在 Azure Stack Hub 上的用户部署型 IaaS 虚拟机 (VM) 制定数据保护和灾难恢复策略。Use this article as a guide to help you develop a data protection and disaster recovery strategy for user-deployed IaaS virtual machines (VMs) deployed on Azure Stack Hub.

为了防止数据丢失和停机时间过长,请为用户应用程序及其数据实施备份恢复或灾难恢复计划。To protect against data loss and extended downtime, implement a backup-recovery or disaster-recovery plan for user applications and their data. 必须按照组织的综合性业务连续性和灾难恢复 (BC/DR) 策略来评估每个应用程序。Each application must be evaluated as part of your organization's comprehensive business continuity and disaster recovery (BC/DR) strategy. 可以从 Azure Stack Hub:业务连续性和灾难恢复的注意事项着手。A good starting point is Azure Stack Hub: Considerations for business continuity and disaster recovery.

有关 IaaS VM 保护的注意事项Considerations for protecting IaaS VMs

角色和职责Roles and responsibilities

首先,请确保对应用程序所有者和操作员在保护和恢复过程中的角色和职责有清楚的了解。First, make sure there is a clear understanding of the roles and responsibilities attributed to application owners and operators in the context of protection and recovery.

用户负责保护 VM。Users are responsible for protecting VMs. 操作员负责使 Azure Stack Hub 保持联机并正常运行。Operators are responsible for keeping Azure Stack Hub online and healthy. Azure Stack Hub 包含的一项服务可用于从基础结构服务备份内部服务数据,这其中不包含任何用户数据(例如用户创建的VM、包含用户数据或应用程序数据的存储帐户,或用户数据库)。Azure Stack Hub includes a service that backs up internal service data from infrastructure services and does not include any user data including user-created VMs, storage accounts with user or application data, or user databases.

应用程序所有者/架构师Application owner/architect Azure Stack Hub 操作员Azure Stack Hub operator
  • 使应用程序体系结构符合云设计原则。Align application architecture with cloud design principles.

  • 根据需要使传统应用程序现代化,针对云环境做好准备。Modernize traditional applications as required, to prepare them for the cloud environment.

  • 为应用程序定义可接受的 RTO 和 RPO。Define acceptable RTO and RPO for the application.

  • 确定需要保护的应用程序资源和数据存储库。Identify application resources and data repositories that need to be protected.

  • 实施与应用程序体系结构和客户要求最相符的数据和应用程序恢复方法。Implement a data and application recovery method that best aligns to the application architecture and customer requirements.
  • 确定组织的业务连续性和灾难恢复目标。Identify the organization's business continuity and disaster recovery goals.

  • 部署足够的 Azure Stack Hub 实例,以满足组织的业务连续性/灾难恢复目标。Deploy enough Azure Stack Hub instances to meet the organization's BC/DR goals.

  • 设计并操作应用程序/数据保护基础设施。Design and operate application/data protection infrastructure.

  • 提供针对保护服务的托管解决方案或自助访问权限。Provide managed solutions or self-service access to protection services.

  • 与应用程序所有者/架构师合作,了解应用程序设计并推荐保护策略。Work with application owners/architects to understand application design and recommend protection strategies.

  • 启用进行服务修复和云恢复所需的基础设施备份。Enable infrastructure backup for service healing and cloud recovery.

源/目标组合Source/target combinations

需要针对数据中心或站点服务中断情况提供防护措施的用户可以使用另一个 Azure Stack Hub 或 Azure 来确保高可用性或快速恢复。Users that need to protect against a datacenter or site outage can use another Azure Stack Hub or Azure to provide high availability or quick recovery. 使用主位置和辅助位置,用户可以在两个环境中采用主动/主动或主动/被动配置来部署应用程序。With primary and secondary location, users can deploy applications in an active/active or active/passive configuration across two environments. 对于不太重要的工作负荷,用户可以使用辅助位置中的容量从备份中执行应用程序的按需还原。For less critical workloads, users can use capacity in the secondary location to perform on-demand restore of applications from backup.

可以将一个或多个 Azure Stack Hub 云部署到数据中心。One or more Azure Stack Hub clouds can be deployed to a datacenter. 为了在重大灾难中存活下来,可以在另一个数据中心内部署至少一个 Azure Stack Hub 云,确保你能够对工作负荷进行故障转移,并最大程度地缩短计划外停机时间。To survive a catastrophic disaster, deploying at least one Azure Stack Hub cloud in a different datacenter ensures that you can failover workloads and minimize unplanned downtime. 如果只有一个 Azure Stack Hub,则应考虑使用 Azure 公有云作为恢复云。If you only have one Azure Stack Hub, you should consider using the Azure public cloud as your recovery cloud. 应用程序运行位置的确定将取决于政府法规、公司策略和延迟要求的严格程度。The determination of where your application can run will be determined by government regulations, corporate policies, and stringent latency requirements. 你可以灵活地确定每个应用程序的适当恢复位置。You have the flexibility to determine the appropriate recovery location per application. 例如,可以让一个订阅中的应用程序将数据备份到另一个数据中心,让另一个订阅中的应用程序将数据复制到 Azure 公有云。For example, you can have applications in one subscription backing up data to another datacenter and in another subscription, replicating data to the Azure public cloud.

应用程序恢复目标Application recovery objectives

应用程序所有者主要负责确定应用程序和组织可以容忍的停机时间和数据丢失量。Application owners are primarily responsible for determining the amount of downtime and data loss that the application and the organization can tolerate. 通过对可接受的停机时间和可接受的数据丢失进行量化,你可以制定恢复计划,将灾难对组织的影响降到最低。By quantifying acceptable downtime and acceptable data loss, you can create a recovery plan that minimizes the impact of a disaster on your organization. 对于每个应用程序,请考虑以下事项:For each application, consider the following:

  • 恢复时间目标 (RTO)Recovery time objective (RTO)
    RTO 是指发生某个事件后,可接受应用不可用的最长时间。RTO is the maximum acceptable time that an app can be unavailable after an incident. 例如,如果 RTO 是 90 分钟,则意味着从发生灾难开始,必须能够在 90 分钟内将应用还原到正常运行状态。For example, an RTO of 90 minutes means that you must be able to restore the app to a running state within 90 minutes from the start of a disaster. 如果 RTO 低,可以持续运转一个后备部署,以防范区域性服务中断。If you have a low RTO, you might keep a second deployment continually running on standby to protect against a regional outage.
  • 恢复点目标 (RPO)Recovery point objective (RPO)
    RPO 是指发生灾难期间,可接受数据丢失的最大持续时间。RPO is the maximum duration of data loss that is acceptable during a disaster. 例如,如果在单个数据库中存储数据并且未将数据复制到其他数据库,而是执行每小时备份,则最长可能会丢失一小时的数据。For example, if you store data in a single database which is backed up hourly and has no replication to other databases, you could lose up to an hour of data.

另一个指标是平均恢复时间 (MTTR),指的是发生故障后还原应用程序所需的平均时间。Another metric is Mean Time to Recover (MTTR), which is the average time that it takes to restore the application after a failure. MTTR 反映的是系统的经验值。MTTR is an empirical value for a system. 如果 MTTR 超过 RTO,则系统发生故障会导致不可接受的业务中断,因为无法在定义的 RTO 内将系统还原。If MTTR exceeds the RTO, then a failure in the system causes an unacceptable business disruption because it won't be possible to restore the system within the defined RTO.

保护选项Protection options

备份-还原Backup-restore

备份应用程序和数据集后,如果因数据损坏、意外删除或灾难而导致停机,你将能够快速恢复。Backing up your applications and datasets enables you to quickly recover from downtime due to data corruption, accidental deletions, or disasters. 对于基于 IaaS VM 的应用程序,你可以使用来宾内代理来保护应用程序数据、操作系统配置以及存储在卷上的数据。For IaaS VM based applications you can use an in-guest agent to protect application data, operating system configuration, and data stored on volumes.

使用来宾内代理进行备份Backup using in-guest agent

使用来宾 OS 代理备份 VM 通常包括捕获操作系统配置、文件/文件夹、磁盘、应用程序二进制文件或应用程序数据。Backing up a VM using a guest OS agent typically includes capturing operating system configuration, files/folders, disks, application binaries, or application data.

从代理恢复应用程序需要以手动方式重新创建 VM,并安装操作系统和来宾代理。Recovering an application from an agent requires manually recreating the VM, installing the operating system, and installation of the guest agent. 然后就可以将数据还原到来宾 OS 或直接还原到应用程序。At that point, data can be restored into the guest OS or directly to the application.

使用已停止的 VM 的磁盘快照进行备份Backup using disk snapshot for stopped VMs

备份产品可以保护附加到已停止 VM 的 IaaS VM 配置和磁盘。Backup products can protect IaaS VM configuration and disks attached to a stopped VM. 使用集成了 Azure Stack Hub API 的备份产品来捕获 VM 配置并创建磁盘快照。Use backup products that integrate with Azure Stack Hub APIs to capture VM configuration and create disk snapshots. 如果可以安排应用程序停机,请在启动备份工作流之前确保 VM 处于已停止状态。If planned downtime for the application is possible, then make sure the VM is in a stopped state before starting backup workflow.

使用运行中的 VM 的磁盘快照进行备份Backup using disk snapshot snapshot for running VMs

重要

对于处于运行状态的 VM,当前不支持使用磁盘快照。Using disk snapshots is currently not supported for VM in a running state. 创建附加到运行中 VM 的磁盘的快照可能会降低性能,或者会影响 VM 中的操作系统或应用程序的可用性。Creating a snapshot of a disk attached to a running VM may degrade the performance or impact the availability of the operating system or application in the VM. 如果无法安排应用程序停机,建议使用来宾内代理来保护应用程序。The recommendation is to use an in-guest agent to protect the application if planned downtime is not an option.

规模集或可用性集中的 VMVMs in a scale-set or availability set

不应在 VM 级别备份规模集或可用性组中被视为临时资源的 VM,特别是在应用程序无状态的情况下。VMs in a scale set or availability group that are considered ephemeral resources should not be backed up at the VM level, especially if the application is stateless. 对于在规模集或可用性集中部署的有状态应用程序,请考虑保护应用程序数据(例如,存储池中的数据库或卷)。For stateful applications deployed in a scale-set or availability set, consider protecting the application data (for example, a database or volume in a storage pool).

复制/手动故障转移Replication/manual failover

对于要求数据丢失量最少或停机时间最短的应用程序,可以在来宾 OS 或应用程序级别启用数据复制,以便将数据复制到其他位置。For applications that require minimal data loss or minimal downtime, data replication can be enabled at the guest OS or application level to replicate data to another location. 某些应用程序(例如 Microsoft SQL Server)本身就支持复制。Some applications, such as Microsoft SQL Server, natively support replication. 如果应用程序不支持复制,则可使用来宾 OS 中的软件来复制磁盘,或者使用来宾 OS 中作为代理安装的合作伙伴解决方案。If the application does not support replication, you can use software in the guest OS to replicate disks, or a partner solution that installs as an agent in the guest OS.

使用此方法时,会将应用程序部署在一个云中,将数据复制到本地的另一个云中或复制到 Azure。With this approach, the application is deployed in one cloud and the data is replicated to the other cloud on-premises or to Azure. 触发了故障转移时,目标中的应用程序将需要启动并附加到复制的数据,然后才能开始处理请求。When a failover is triggered, the application in the target will need to be started and attached to the replicated data before it can start servicing requests.

高可用性/自动故障转移High availability/automatic failover

对于只能容忍数秒或数分钟停机时间的无状态应用程序,请考虑使用高可用性配置。For stateless applications that can only tolerate a few seconds or minutes of downtime, consider a high-availability configuration. 根据设计,高可用性应用程序部署在主动/主动拓扑中的多个位置,其中的所有实例都可以处理请求。High-availability applications are designed to be deployed in multiple locations in an active/active topology where all instances can service requests. 对于本地硬件故障,Azure Stack Hub 基础结构使用两个架顶式交换机在物理网络中实现高可用性。For local hardware faults, the Azure Stack Hub infrastructure implements high availability in the physical network using two top of rack switches. 对于计算级别故障,Azure Stack Hub 会在一个缩放单元中使用多个节点并会自动对 VM 进行故障转移。For compute-level faults, Azure Stack Hub uses multiple nodes in a scale unit and will automatically failover a VM. 在 VM 级别,可以使用规模集,也可以使用可用性集中的 VM,以确保节点故障不会导致应用程序无法使用。At the VM level, you can use scale sets or VMs in availability set to ensure node failures don't take down your application. 需要将同一应用程序采用同一配置部署到辅助位置。The same application would need to be deployed to a secondary location in the same configuration. 若要使应用程序采用主动/主动配置,可以使用负载均衡器或 DNS 将请求定向到所有可用实例。To make the application active/active, a load balancer or DNS can be used to direct requests to all available instances.

不恢复No recovery

环境中的某些应用可能不需要针对计划外停机或数据丢失进行保护。Some apps in your environment may not need protection against unplanned downtime or data loss. 例如,用于开发和测试的 VM 通常不需要进行恢复。For example, VMs used for development and testing typically do not need to be recovered. 是否对某个应用程序或数据集进行保护由你自行决定。It's your decision to do without protection for an application or dataset.

Azure Stack Hub 部署的重要注意事项:Important considerations for your Azure Stack Hub deployment:

建议Recommendation 注释Comments
将 VM 备份/还原到已部署在数据中心的外部备份目标Backup/restore VMs to an external backup target already deployed in your datacenter 建议Recommended 利用现有的备份基础结构和操作技能。Take advantage of existing backup infrastructure and operational skills. 确保在设置备份基础结构的大小时,能够让它保护其他的 VM 实例。Make sure to size the backup infrastructure so it's ready to protect the additional VM instances. 确保备份基础结构不要紧靠源。Make sure backup infrastructure isn't in close proximity to your source. 可以将 VM 还原到源 Azure Stack Hub、辅助 Azure Stack Hub 实例或 Azure。You can restore VMs to the source Azure Stack Hub, to a secondary Azure Stack Hub instance, or Azure.
将 VM 备份/还原到专用于 Azure Stack Hub 的外部备份目标Backup/restore VMs to an external backup target dedicated to Azure Stack Hub 建议Recommended 可以为 Azure Stack Hub 购买新的备份基础结构或预配专用的备份基础结构。You can purchase new backup infrastructure or provision dedicated backup infrastructure for Azure Stack Hub. 确保备份基础结构不要紧靠源。Make sure backup infrastructure isn't in close proximity to your source. 可以将 VM 还原到源 Azure Stack Hub、辅助 Azure Stack Hub 实例或 Azure。You can restore VMs to the source Azure Stack Hub, to a secondary Azure Stack Hub instance, or Azure.
将 VM 直接备份/还原到全球版 Azure 或受信任的服务提供商Backup/restore VMs directly to global Azure or a trusted service provider 建议Recommended 只要能够满足数据隐私和法规要求,就可以将备份存储到全球版 Azure 或受信任的服务提供商。As long as you can meet your data privacy and regulatory requirements, you can store your backups in global Azure or a trusted service provider. 理想情况下,该服务提供商也会运行 Azure Stack Hub,因此你在还原时获得的操作体验是一致的。Ideally the service provider is also running Azure Stack Hub so you get consistency in operational experience when you restore.
将 VM 复制/故障转移到单独的 Azure Stack Hub 实例Replicate/failover VMs to a separate Azure Stack Hub instance 建议Recommended 在进行故障转移时,需要有一个运行完全正常的辅助 Azure Stack Hub 云,这样就可以避免应用不可用的时间延长。In the failover case, you need to have a second Azure Stack Hub cloud fully operational so you can avoid extended app downtime.
将 VM 直接复制/故障转移到 Azure 或受信任的服务提供商Replicate/failover VMs directly to Azure or a trusted service provider 建议Recommended 只要能够满足数据隐私和法规要求,就可以将数据复制到全球版 Azure 或受信任的服务提供商。As long as you can meet your data privacy and regulatory requirements, you can replicate your data to global Azure or a trusted service provider. 理想情况下,该服务提供商也会运行 Azure Stack Hub,因此你在故障转移后获得的操作体验是一致的。Ideally the service provider is also running Azure Stack Hub so you get consistency in operational experience after failover.
将备份目标部署到也承载着由该备份目标保护的所有应用程序的 Azure Stack Hub 上。Deploy a backup target on the same Azure Stack Hub that also hosts all the applications protected by the same backup target. 独立目标:不建议Stand-alone target: Not recommended
将备份数据复制到外部的目标:建议Target that replicates backup data externally: Recommended
如果选择将备份设备部署到 Azure Stack Hub 上(目的是对可操作还原进行优化),则必须确保将所有数据持续复制到外部备份位置。If you choose to deploy a backup appliance on Azure Stack Hub (for the purposes of optimizing operational restore), you must ensure all data is continuously copied to an external backup location.
将物理备份设备部署到安装了 Azure Stack Hub 解决方案的机架Deploy physical backup appliance into the same rack where the Azure Stack Hub solution is installed 不支持Not supported 目前,不能将任何其他设备连接到不属于原始解决方案的架顶式交换机。Currently, you can't connect any other devices to the top of rack switches that aren't part of the original solution.

后续步骤Next steps

本文提供了用于保护 Azure Stack Hub 上部署的用户 VM 的一般准则。This article provided general guidelines for protecting user VMs deployed on Azure Stack Hub. 有关使用 Azure 服务保护用户 VM 的信息,请参阅:For information about using Azure services to protect user VMs, refer to:

Azure 备份服务器Azure Backup Server

Azure Site RecoveryAzure Site Recovery

合作伙伴产品Partner products