监视 Site RecoveryMonitor Site Recovery

本文介绍如何使用 Site Recovery 的内置监视功能监视 Azure Site RecoveryIn this article, learn how to monitor Azure Site Recovery, using Site Recovery inbuilt monitoring. 可以监视:You can monitor:

  • 通过 Site Recovery 复制的计算机的运行状况和状态The health and status of machines replicated by Site Recovery
  • 测试计算机的故障转移状态。Test failover status of machines.
  • 影响配置和复制的问题和错误。Issues and errors affecting configuration and replication.
  • 基础结构组件,例如本地服务器。Infrastructure components such as on-premises servers.

开始之前Before you start

在开始之前,可能需要查看常见监视问题You might want to review common monitoring questions before you start.

在仪表板中监视Monitor in the dashboard

  1. 在保管库中,单击“概览” 。In the vault, click Overview. 恢复服务仪表板在单个位置合并了保管库的所有监视信息。The Recovery Services dashboard consolidates all monitoring information for the vault in a single location. Site Recovery 和 Azure 备份服务都有页面,可在这些页面之间切换。There are pages for both Site Recovery and the Azure Backup service, and you can switch between them.

    Site Recovery 仪表板

  2. 在该仪表板中,向下钻取到不同的区域。From the dashboard, drill down into different areas.

    Site Recovery 仪表板.

  3. 在“复制的项”中,单击“全部查看”可查看保管库中的所有服务器。 In Replicated items, click View All to see all the servers in the vault.

  4. 单击每个部分的状态详细信息,以便向下钻取。Click the status details in each section to drill down.

  5. 在“基础结构”视图中,按复制的计算机类型将监视信息排序。 In Infrastructure view, sort monitoring information by the type of machines you're replicating.

监视复制的项Monitor replicated items

“复制的项”监视保管库中已启用复制的所有计算机的运行状况。 In Replicated items, monitor the health of all machines in the vault that have replication enabled.

StateState 详细信息Details
正常Healthy 复制正常进行。Replication is progressing normally. 未检测到任何错误或警告症状。No error or warning symptoms are detected.
警告Warning 检测到一个或多个可能影响复制的警告症状。One or more warning symptoms that might impact replication are detected.
严重Critical 检测到一个或多个严重复制错误症状。One or more critical replication error symptoms have been detected.

这些错误症状通常指示复制处于停滞状态,或者复制进度跟不上数据更改速率。These error symptoms are typically indicators that replication stuck, or not progressing as fast as the data change rate.
不适用Not applicable 目前预期服务器无法复制。Servers that aren't currently expected to be replicating. 这可能包括已故障转移的计算机。This might include machines that have been failed over.

监视测试故障转移Monitor test failovers

在“故障转移测试成功”中,监视保管库中计算机的故障转移状态。 In Failover test success, monitor the failover status for machines in the vault.

  • 我们建议每隔六个月在复制的计算机上至少运行测试故障转移一次。We recommend that you run a test failover on replicated machines at least once every six months. 这样,便可以在不中断生产环境的情况下,检查故障转移是否按预期工作。It's a way to check that failover is working as expected, without disrupting your production environment.
  • 只有在成功完成故障转移以及故障转移后的清理过程之后,才将测试故障转移视为成功。A test failover is considered successful only after the failover and post-failover cleanup have completed successfully.
StateState 详细信息Details
建议的测试Test recommended 自启用保护以来未进行测试故障转移的计算机。Machines that haven't had a test failover since protection was enabled.
已成功执行Performed successfully 已成功完成一次或多次测试故障转移的计算机。Machines with or more successful test failovers.
不适用Not applicable 目前不符合测试故障转移条件的计算机。Machines that aren't currently eligible for a test failover. 例如,已故障转移的计算机、正在进行初始复制/测试故障转移/故障转移的计算机。For example, machines that are failed over, have initial replication/test failover/failover in progress.

监视配置问题Monitor configuration issues

在“配置问题”中监视任何可能影响你能否成功进行故障转移的问题。 In Configuration issues, monitor any issues that might impact your ability to fail over successfully.

  • 一个默认每隔 12 小时定期运行的验证程序操作将会检测配置问题(但不会检测软件更新可用性)。Configuration issues (except for software update availability), are detected by a periodic validator operation that runs every 12 hours by default. 单击“配置问题”部分标题旁边的刷新图标可以强制验证程序操作立即运行。 You can force the validator operation to run immediately by clicking the refresh icon next to the Configuration issues section heading.
  • 单击相应的链接获取更多详细信息。Click the links to get more details. 对于影响特定计算机的问题,请单击“目标配置”列中的“需要关注”。 For issues impacting specific machines, click needs attention in the Target configurations column. 详细信息包括补救措施的建议。Details include remediation recommendations.
StateState 详细信息Details
缺少配置Missing configurations 缺少所需的设置,例如恢复网络或资源组。A necessary setting is missing, such as a recovery network or a resource group.
缺少资源Missing resources 指定的资源未找到,或者在订阅中不可用。A specified resource can't be found or isn't available in the subscription. 例如,已删除或迁移了资源。For example, the resource was deleted or migrated. 受监视的资源包括目标资源组、目标 VNet/子网、日志/目标存储帐户、目标可用性集、目标 IP 地址。Monitored resources included the target resource group, target VNet/subnet, log/target storage account, target availability set, target IP address.
订阅配额Subscription quota 将可用订阅资源配额的余量,与故障转移保管库中所有计算机所需的余量进行比较。The available subscription resource quota balance is compared against the balance needed to fail over all of the machines in the vault.

如果资源不足,则报告不足的配额余量。If there aren't enough resources, an insufficient quota balance is reported.

配额是要监视的 VM 核心计数、VM 系列核心计数和网络接口卡 (NIC) 计数。Quotas are monitoring for VM core count, VM family core count, network interface card (NIC) count.
软件更新Software updates 新软件更新的可用性,以及有关即将过期的软件版本的信息。The availability of new software updates, and information about expiring software versions.

监视错误Monitor errors

在“错误摘要”中,监视目前尚未解决的、可能影响保管库中服务器的复制的错误症状,以及监视受影响的计算机数目。 In Error summary, monitor currently active error symptoms that might impact replication of servers in the vault, and monitor the number of impacted machines.

  • 该部分的开头显示影响本地基础结构组件的错误。Errors impacting on-premises infrastructure components are shown are the beginning of the section. 例如,未从本地配置服务器或 Hyper-V 主机上的 Azure Site Recovery 提供程序收到检测信号。For example, non-receipt of a heartbeat from the Azure Site Recovery Provider on the on-premises configuration server, or Hyper-V host.
  • 接下来显示影响已复制的服务器的复制错误症状。Next, replication error symptoms impacting replicated servers are shown.
  • 表条目分别按错误严重性的降序以及受影响计算机数的降序排序。The table entries are sorted by decreasing order of the error severity, and then by decreasing count order of the impacted machines.
  • 参考受影响服务器数能够很好地了解单一根本问题是否影响了多台计算机。The impacted server count is a useful way to understand whether a single underlying issue might impact multiple machines. 例如,网络问题可能会影响复制到 Azure 的所有计算机。For example, a network glitch could potentially impact all machines that replicate to Azure.
  • 单个服务器上可能出现多个复制错误。Multiple replication errors can occur on a single server. 在这种情况下,每个错误症状会将该服务器计入到它所影响的服务器列表中。In this case, each error symptom counts that server in the list of its impacted servers. 解决问题后,复制参数将得到改善,而该错误将从计算机中清除。After the issue is fixed, replication parameters improve, and the error is cleared from the machine.

监视基础架构。Monitor the infrastructure.

在“基础结构”视图中,监视参与复制的基础结构组件,以及服务器与 Azure 服务之间的连接运行状况。 In Infrastructure view, monitor the infrastructure components involved in replication, and connectivity health between servers and the Azure services.

  • 绿线表示连接正常。A green line indicates that connection is healthy.

  • 带有叠加错误图标的红线指示存在一个或多个影响连接的错误症状。A red line with the overlaid error icon indicates the existence of one or more error symptoms that impact connectivity.

  • 将鼠标指针悬停在错误图标上会显示错误和受影响实体的数目。Hover the mouse pointer over the error icon to show the error and the number of impacted entities. 单击图标会显示受影响实体的筛选列表。Click the icon for a filtered list of impacted entities.

    Site Recovery 基础结构视图(保管库)

有关监视基础结构的提示Tips for monitoring the infrastructure

  • 确保本地基础结构组件(配置服务器、进程服务器、VMM 服务器、Hyper-V 主机、VMware 计算机)运行最新版本的 Site Recovery 提供程序和/或代理。Make sure that the on-premises infrastructure components (configuration server, process servers, VMM servers, Hyper-V hosts, VMware machines) are running the latest versions of the Site Recovery Provider and/or agents.

  • 若要使用基础结构视图的所有功能,应运行这些组件的更新汇总 22To use all the features in the infrastructure view, you should be running Update rollup 22 for these components.

  • 若要使用基础结构视图,请选择适用于环境的复制方案。To use the infrastructure view, select the appropriate replication scenario in your environment. 可以在视图中向下钻取以查看更多详细信息。You can drill down in the view for more details. 下表显示了代表的方案。The following table shows which scenarios are represented.

    方案Scenario StateState 视图可用?View available?
    在本地站点之间复制Replication between on-premises sites 所有状态All states No
    Azure 区域之间的 Azure VM 复制Azure VM replication between Azure regions 已启用复制/初始复制正在进行Replication enabled/initial replication in progress Yes
    Azure 区域之间的 Azure VM 复制Azure VM replication between Azure regions 已故障转移/故障回复Failed over/fail back No
    从 VMware 复制到 AzureVMware replication to Azure 已启用复制/初始复制正在进行Replication enabled/initial replication in progress Yes
    从 VMware 复制到 AzureVMware replication to Azure 已故障转移/故障回复Failed over/failed back No
    从 Hyper-V 复制到 AzureHyper-V replication to Azure 已故障转移/故障回复Failed over/failed back No
  • 若要查看单个复制计算机的基础结构视图,请在保管库菜单中单击“复制的项”,然后选择一个服务器。 To see the infrastructure view for a single replicating machine, in the vault menu, click Replicated items, and select a server.

监视恢复计划Monitor recovery plans

在“恢复计划”中,监视计划数目、创建新计划,以及修改现有计划。 In Recovery plans, monitor the number of plans, create new plans, and modify existing ones.

监视作业Monitor jobs

在“作业”中,监视 Site Recovery 操作的状态。 In Jobs, monitor the status of Site Recovery operations.

  • Azure Site Recovery 中的大多数操作以异步方式执行,将创建并使用一个跟踪作业来跟踪操作进度。Most operations in Azure Site Recovery are executed asynchronously, with a tracking job being created and used to track progress of the operation.
  • 作业对象包含跟踪操作状态和进度的全部所需信息。The job object has all the information you need to track the state and the progress of the operation.

按如下所述监视作业:Monitor jobs as follows:

  1. 在仪表板中转到“作业”部分,可以看到过去 24 小时内已完成的、正在进行的或等待输入的作业的摘要。 In the dashboard > Jobs section, you can see a summary of jobs that have completed, are in progress, or waiting for input, in the last 24 hours. 可以单击任一状态获取相关作业的详细信息。You can click on any state to get more information about the relevant jobs.

  2. 单击“全部查看”可查看过去 24 小时内的所有作业。 Click View all to see all jobs in the last 24 hours.

    备注

    还可以从保管库菜单 >“Site Recovery 作业”访问作业信息。 You can also access job information from the vault menu > Site Recovery Jobs.

  3. “Site Recovery 作业”列表中显示了作业列表。 In the Site Recovery Jobs list, a list of jobs is displayed. 在顶部菜单中,可以获取特定作业的错误详细信息、根据特定的条件筛选作业列表,以及将选定作业的详细信息导出到 Excel。On the top menu you can get error details for a specific jobs, filter the jobs list based on specific criteria, and export selected job details to Excel.

  4. 单击某个作业可深入查看更多信息。You can drill into a job by clicking it.

监视虚拟机Monitor virtual machines

在“复制的项”中,获取复制的计算机的列表。 In Replicated items, get a list of replicated machines. Site Recovery 中“复制的项”列表视图Site Recovery replicated items list view

  1. 可以查看和筛选信息。You can view and filter information. 在顶部的操作菜单中,可以针对特定的计算机执行操作,包括运行测试故障转移,或查看特定的错误。On the action menu at the top, you can perform actions for a particular machine, including running a test failover, or viewing specific errors.
  2. 单击“列”可显示其他列,例如,显示 RPO、目标配置问题和复制错误。 Click Columns to show additional columns, For example to show RPO, target configuration issues, and replication errors.
  3. 单击“筛选器”可以根据复制运行状况或特定复制策略等特定参数来查看信息。 Click Filter to view information based on specific parameters such as replication health, or a particular replication policy.
  4. 右键单击某个计算机可以启动操作,例如,执行测试故障转移,或查看与它关联的特定错误详细信息。Right-click a machine to initiate operations such as test failover for it, or to view specific error details associated with it.
  5. 单击某个计算机可以深入查看其更多详细信息。Click a machine to drill into more details for it. 详细信息包括:Details include:
    • 复制信息:计算机的当前状态和运行状况。Replication information: Current status and health of the machine.

    • RPO(恢复点目标):虚拟机的当前 RPO,以及上次计算 RPO 的时间。RPO (recovery point objective): Current RPO for the virtual machine and the time at which the RPO was last computed.

    • 恢复点:计算机的最新可用恢复点。Recovery points: Latest available recovery points for the machine.

    • 故障转移就绪性:指示是否对该计算机运行了测试故障转移、计算机上运行的代理版本(适用于运行移动服务的计算机)和任何配置问题。Failover readiness: Indicates whether a test failover was run for the machine, the agent version running on the machine (for machines running the Mobility service), and any configuration issues.

    • 错误:列出当前在计算机上观察到的复制错误症状,以及可能的原因/措施。Errors: List of replication error symptoms currently observed on the machine, and possible causes/actions.

    • 事件:影响计算机的最近事件列表,按时间顺序列出。Events: A chronological list of recent events impacting the machine. 错误详细信息显示当前可观测到的错误症状,而事件是影响了计算机的问题的历史记录。Error details shows the currently observable error symptoms, while events is a historical record of issues that have impacted the machine.

    • 基础结构视图:显示将计算机复制到 Azure 时方案的基础结构状态。Infrastructure view: Shows state of infrastructure for the scenario when machines are replicating to Azure.

      Azure Site Recovery 中复制的项详细信息/概述