监视进程服务器Monitor the process server

本文介绍如何监视 Site Recovery 进程服务器。This article describes how to monitor the Site Recovery process server.

  • 在将本地 VMware VM 和物理服务器的灾难恢复设置到 Azure 时,会使用进程服务器。The process server is used when you set up disaster recovery of on-premises VMware VMs and physical servers to Azure.
  • 默认情况下,进程服务器在配置服务器上运行。By default the process server runs on the configuration server. 它会在你部署配置服务器时默认安装。It's installed by default when you deploy the configuration server.
  • (可选)若要缩放和处理更多的复制计算机和更高的复制流量,可以部署更多的横向扩展进程服务器。Optionally, to scale and handle larger numbers of replicated machines and higher volumes of replication traffic, you can deploy additional, scale-out process servers.

详细了解进程服务器的角色和部署。Learn more about the role and deployment of process servers.

监视概述Monitoring overview

由于进程服务器具有如此多的角色,尤其是在复制数据缓存、压缩和传输到 Azure 时,因此持续监视进程服务器运行状况非常重要。Since the process server has so many roles, particularly in replicated data caching, compression, and transfer to Azure, it's important to monitor process server health on an ongoing basis.

有一些情况通常会影响进程服务器的性能。There are a number of situations that commonly affect process server performance. 影响性能的问题会对 VM 运行状况产生关联影响,最终导致进程服务器及其复制计算机进入严重状态。Issues affecting performance will have a cascading effect on VM health, eventually pushing both the process server and its replicated machines into a critical state. 这类情况包括:Situations include:

  • 大量 VM 使用进程服务器,接近或超过建议限制。High numbers of VMs use a process server, approaching or exceeding recommended limitations.
  • 使用进程服务器的 VM 的改动率较高。VMs using the process server have a high churn rate.
  • VM 与进程服务器之间的网络吞吐量不足以将复制数据上传到进程服务器。Network throughput between VMs and the process server isn't enough to upload replication data to the process server.
  • 进程服务器与 Azure 之间的网络吞吐量不足以将复制数据从进程服务器上传到 Azure。Network throughput between the process server and Azure isn't sufficient to upload replication data from the process server to Azure.

所有这些问题都可能会影响 VM 的恢复点目标 (RPO)。All of these issues can affect the recovery point objective (RPO) of VMs.

为什么?Why? 因为为 VM 生成恢复点需要 VM 上的所有磁盘具有一个共同点。Because generating a recovery point for a VM requires all disks on the VM to have a common point. 如果某个磁盘的改动率较高、复制速度较慢或是进程服务器不是最佳状态,则会影响创建恢复点的效率。If one disk has a high churn rate, replication is slow, or the process server isn't optimal, it impacts how efficiently recovery points are created.

主动监视Monitor proactively

若要避免进程服务器出现问题,请务必:To avoid issues with the process server, it's important to:

  • 使用容量和大小调整指南了解进程服务器的特定要求,并确保根据建议部署并运行进程服务器。Understand specific requirements for process servers using capacity and sizing guidance, and make sure process servers are deployed and running according to recommendations.
  • 监视警报并在出现问题时进行故障排除,使进程服务器保持高效运行。Monitor alerts, and troubleshoot issues as they occur, to keep process servers running efficiently.

进程服务器警报Process server alerts

进程服务器会生成一些运行状况警报,在下表中进行了汇总。The process server generates a number of health alerts, summarized in the following table.

警报类型Alert type 详细信息Details
正常 进程服务器已连接并且状态正常。Process server is connected and healthy.
警告 过去 15 分钟内的 CPU 使用率 > 80%CPU utilization > 80% for the last 15 minutes
警告 过去 15 分钟内的内存使用率 > 80%Memory usage > 80% for the last 15 minutes
警告 过去 15 分钟内的缓存文件夹可用空间 < 30%Cache folder free space < 30% for the last 15 minutes
警告 Site Recovery 每五分钟监视挂起/传出数据,估计进程服务器缓存中的数据在 30 分钟内无法上传到 Azure。Site Recovery monitors pending/outgoing data every five minutes, and estimates that data in the process server cache can't be uploaded to Azure within 30 minutes.
警告 进程服务器服务在过去 15 分钟内未运行Process server services aren't running for the last 15 minutes
严重 过去 15 分钟内的 CPU 使用率 > 95%CPU utilization > 95% for the last 15 minutes
严重 过去 15 分钟内的内存使用率 > 95%Memory usage > 95% for the last 15 minutes
严重 过去 15 分钟内的缓存文件夹可用空间 < 25%Cache folder free space < 25% for the last 15 minutes
严重 Site Recovery 每五分钟监视挂起/传出数据,估计进程服务器缓存中的数据在 45 分钟内无法上传到 Azure。Site Recovery monitors pending/outgoing data every five minutes, and estimates that data in the process server cache can't be uploaded to Azure within 45 minutes.
严重 在 15 分钟内没有来自进程服务器的检测信号。No heartbeat from the process server for 15 minutes.

表键

备注

进程服务器的总体运行状况基于生成的最差警报。The overall health status of the process server is based on the worst alert generated.

监视进程服务器运行状况Monitor process server health

可以按如下所示监视进程服务器的运行状况:You can monitor the health state of your process servers as follows:

  1. 若要监视复制计算机及其进程服务器的复制运行状况和状态,请在保管库 >“复制的项”中,单击要监视的计算机。To monitor the replication health and status of a replicated machine, and of its process server, in vault > Replicated items, click the machine you want to monitor.

  2. 在“复制运行状况”中,可以监视 VM 运行状况。In Replication Health, you can monitor the VM health status. 单击状态以向下钻取错误详细信息。Click the status to drill down for error details.

    VM 仪表板中的进程服务器运行状况

  3. 在“进程服务器运行状况”中,可以监视进程服务器的状态。In Process Server Health, you can monitor the status of the process server. 向下钻取详细信息。Drill down for details.

    VM 仪表板中的进程服务器详细信息

  4. 还可以使用 VM 页上的图形表示形式来监视运行状况。Health can also be monitored using the graphical representation on the VM page.

    • 横向扩展进程服务器在有与之关联的警告时以橙色突出显示,在有任何严重问题时为红色。A scale-out process server will be highlighted in orange if there are warnings associated with it, and red if it has any critical issues.
    • 如果进程服务器在配置服务器上的默认部署中运行,则会相应地突出显示配置服务器。If the process server is running in the default deployment on the configuration server, then the configuration server will be highlighted accordingly.
    • 若要向下钻取,请单击配置服务器或进程服务器。To drill down, click on the configuration server or process server. 请记下任何问题以及任何修正建议。Note any issues, and any remediation recommendations.

还可以在“Site Recovery 基础结构”下的保管库中监视进程服务器。You can also monitor process servers in the vault under Site Recovery Infrastructure. 在“管理 Site Recovery 基础结构”中,单击“配置服务器”。In Manage your Site Recovery infrastructure, click Configuration Servers. 选择与进程服务器关联的配置服务器,并向下钻取到进程服务器详细信息。Select the configuration server associated with the process server, and drill down into process server details.

后续步骤Next steps