解决 VMware VM 和物理服务器的复制问题Troubleshoot replication issues for VMware VMs and physical servers

本文介绍在使用 Site Recovery 将本地 VMware VM 和物理服务器复制到 Azure 时可能遇到的一些常见问题和具体错误。This article describes some common issues and specific errors you might encounter when you replicate on-premises VMware VMs and physical servers to Azure using Site Recovery.

步骤 1:监视进程服务器运行状况Step 1: Monitor process server health

Site Recovery 使用进程服务器接收和优化复制的数据,并将其发送到 Azure。Site Recovery uses the process server to receive and optimize replicated data, and send it to Azure.

建议在门户中监视进程服务器的运行状况,以确保它们已连接并正常运行,且正在对进程服务器关联的源计算机进行复制。We recommend that you monitor the health of process servers in portal, to ensure that they are connected and working properly, and that replication is progressing for the source machines associated with the process server.

步骤 2:排查连接和复制问题Step 2: Troubleshoot connectivity and replication issues

初始和进行中的复制失败往往是源服务器与进程服务器或者进程服务器与 Azure 之间的连接问题造成的。Initial and ongoing replication failures often are caused by connectivity issues between the source server and the process server or between the process server and Azure.

若要解决这些问题,请排查连接和复制问题To solve these issues, troubleshoot connectivity and replication.

步骤 3:排查不可用于复制的源计算机的问题Step 3: Troubleshoot source machines that aren't available for replication

尝试选择源计算机来通过 Site Recovery 启用复制时,计算机可能由于以下原因之一而不可用:When you try to select the source machine to enable replication by using Site Recovery, the machine might not be available for one of the following reasons:

  • 具有相同实例 UUID 的两个虚拟机 :如果 vCenter 中存在具有相同实例 UUID 的两个虚拟机,则配置服务器发现的第一个虚拟机将显示在 Azure 门户中。Two virtual machines with same instance UUID : If two virtual machines under the vCenter have the same instance UUID, the first virtual machine discovered by the configuration server is shown in the Azure portal. 若要解决此问题,请确保没有两个虚拟机具有相同的实例 UUID。To resolve this issue, ensure that no two virtual machines have the same instance UUID. 如果备份 VM 处于活动状态,并且已记录到发现记录,则这种情况会很常见。This scenario is commonly seen in instances where a backup VM becomes active and is logged into our discovery records. 请参阅使用 Azure Site Recovery 进行 VMware 到 Azure 的复制:如何清理重复或过时的条目来解决问题。Refer to Azure Site Recovery VMware-to-Azure: How to clean up duplicate or stale entries to resolve.
  • vCenter 用户凭据不正确 :确保在使用 OVF 模板或统一安装程序安装配置服务器时添加正确的 vCenter 凭据。Incorrect vCenter user credentials : Ensure that you added the correct vCenter credentials when you set up the configuration server by using the OVF template or unified setup. 若要验证安装期间添加的凭据是否正确,请参阅修改用于自动发现的凭据To verify the credentials that you added during setup, see Modify credentials for automatic discovery.
  • vCenter 特权不足 :如果未提供所需的权限来访问 vCenter,则发现虚拟机时可能会失败。vCenter insufficient privileges : If the permissions provided to access vCenter don't have the required permissions, failure to discover virtual machines might occur. 确保将为自动发现准备帐户中所述的权限添加到 vCenter 用户帐户。Ensure that the permissions described in Prepare an account for automatic discovery are added to the vCenter user account.
  • Azure Site Recovery 管理服务器 :如果虚拟机用作管理服务器并充当以下一个或多个角色 - 配置服务器/横向扩展进程服务器/主目标服务器,则无法在门户中选择该虚拟机。Azure Site Recovery management servers : If the virtual machine is used as management server under one or more of the following roles - Configuration server /scale-out process server / Master target server, then you will not be able to choose the virtual machine from portal. 无法复制管理服务器。Managements servers cannot be replicated.
  • 已通过 Azure Site Recovery 服务进行保护/故障转移 :如果虚拟机已通过 Site Recovery 进行保护或故障转移,则无法在门户中选择保护该虚拟机。Already protected/failed over through Azure Site Recovery services : If the virtual machine is already protected or failed over through Site Recovery, the virtual machine isn't available to select for protection in the portal. 确保要在门户中查找的虚拟机尚未由其他任何用户进行保护,或者位于不同的订阅下。Ensure that the virtual machine you're looking for in the portal isn't already protected by any other user or under a different subscription.
  • vCenter 未连接 :检查 vCenter 是否处于连接状态。vCenter not connected : Check if vCenter is in connected state. 若要验证,请转到“恢复服务保管库”>“Site Recovery 基础结构”>“配置服务器”,单击相应的配置服务器,此时,右侧会打开一个边栏选项卡,其中显示了关联服务器的详细信息。To verify, go to Recovery Services vault > Site Recovery Infrastructure > Configuration Servers > Click on respective configuration server > a blade opens on your right with details of associated servers. 检查 vCenter 是否已连接。Check if vCenter is connected. 如果其处于“未连接”状态,请解决问题,并在门户中刷新配置服务器If it's in a "Not Connected" state, resolve the issue and then refresh the configuration server on the portal. 然后,虚拟机将列在门户中。After this, virtual machine will be listed on the portal.
  • ESXi 已关机 :如果虚拟机所在的 ESXi 主机处于关机状态,则虚拟机将不会列出,或者在 Azure 门户中不可选择。ESXi powered off : If ESXi host under which the virtual machine resides is in powered off state, then virtual machine will not be listed or will not be selectable on the Azure portal. 打开 ESXi 主机,并在门户中刷新配置服务器Power on the ESXi host, refresh the configuration server on the portal. 然后,虚拟机将列在门户中。After this, virtual machine will be listed on the portal.
  • 正在等待重新启动 :如果虚拟机正在等待重新启动,则你无法在 Azure 门户中选择该虚拟机。Pending reboot : If there is a pending reboot on the virtual machine, then you will not be able to select the machine on Azure portal. 请务必完成等待中的重新启动活动,并刷新配置服务器Ensure to complete the pending reboot activities, refresh the configuration server. 然后,虚拟机将列在门户中。After this, virtual machine will be listed on the portal.
  • 找不到 IP :如果虚拟机没有关联有效的 IP 地址,则你无法在 Azure 门户中选择该虚拟机。IP not found : If the virtual machine doesn't have a valid IP address associated with it, then you will not be able to select the machine on Azure portal. 请务必将有效的 IP 地址分配到虚拟机,并刷新配置服务器Ensure to assign a valid IP address to the virtual machine, refresh the configuration server. 然后,虚拟机将列在门户中。After this, virtual machine will be listed on the portal.

排查门户中处于灰显状态的受保护虚拟机的问题Troubleshoot protected virtual machines greyed out in the portal

如果系统中存在重复的条目,则在 Site Recovery 下复制的虚拟机将不会显示在 Azure 门户中。Virtual machines that are replicated under Site Recovery aren't available in the Azure portal if there are duplicate entries in the system. 若要了解如何删除过时的条目和解决此问题,请参阅使用 Azure Site Recovery 进行 VMware 到 Azure 的复制:如何清除重复或过时的条目To learn how to delete stale entries and resolve the issue, refer to Azure Site Recovery VMware-to-Azure: How to clean up duplicate or stale entries.

在过去“XXX”分钟内没有可供 VM 使用的崩溃一致性恢复点No crash consistent recovery point available for the VM in the last 'XXX' minutes

下面列出了其中的一些最常见问题Some of the most common issues are listed below

初始复制问题 [错误 78169]Initial replication issues [error 78169]

反复确认不存在连接、带宽或时间同步相关的问题后,请确保:Over an above ensuring that there are no connectivity, bandwidth or time sync related issues, ensure that:

  • 没有任何防病毒软件正在阻止 Azure Site Recovery。No anti-virus software is blocking Azure Site Recovery. 详细了解 Azure Site Recovery 要求排除的文件夹。Learn more on folder exclusions required for Azure Site Recovery.

源计算机改动率高 [错误 78188]Source machines with high churn [error 78188]

可能的原因:Possible Causes:

  • 虚拟机列出的磁盘上的数据更改率(写入字节数/秒)大于复制目标存储帐户类型的 Azure Site Recovery 支持限制The data change rate (write bytes/sec) on the listed disks of the virtual machine is more than the Azure Site Recovery supported limits for the replication target storage account type.
  • 由于大量数据正在等待上传,导致改动率激增。There is a sudden spike in the churn rate due to which high amount of data is pending for upload.

若要解决问题,请执行以下操作:To resolve the issue:

  • 确保根据源中的改动率要求预配目标存储帐户类型(标准或高级)。Ensure that the target storage account type (Standard or Premium) is provisioned as per the churn rate requirement at source.
  • 如果已复制到高级托管磁盘(asrseeddisk 类型),请确保磁盘大小支持根据 Site Recovery 限制观察到的改动率。If you are already replicating to a Premium managed disk (asrseeddisk type), ensure that the size of the disk supports the observed churn rate as per Site Recovery limits. 如果需要,可以增加 asrseeddisk 的大小。You can increase the size of the asrseeddisk if required. 请按照以下步骤操作:Follow the below steps:
    • 导航到受影响的复制计算机的“磁盘”边栏选项卡,并复制副本磁盘名称Navigate to the Disks blade of the impacted replicated machine and copy the replica disk name
    • 导航到此副本托管磁盘Navigate to this replica managed disk
    • 可能会在“概述”边栏选项卡上看到一个横幅,指出已生成 SAS URL。You may see a banner on the Overview blade saying that a SAS URL has been generated. 单击此横幅并取消导出。Click on this banner and cancel the export. 如果看不到横幅,请忽略此步骤。Ignore this step if you do not see the banner.
    • 撤销 SAS URL 后,请转至托管磁盘的“配置”边栏选项卡并增加大小,以便 Azure Site Recovery 支持源磁盘上观察到的变动率As soon as the SAS URL is revoked, go to Configuration blade of the Managed Disk and increase the size so that Azure Site Recovery supports the observed churn rate on source disk
  • 如果观测到的改动率是暂时性的,请等待几个小时,让等待中的数据跟上上传进度并创建恢复点。If the observed churn is temporary, wait for a few hours for the pending data upload to catch up and to create recovery points.
  • 如果磁盘包含非关键数据(如临时日志、测试数据等),请考虑将此数据移到其他位置,或者从复制中完全排除此磁盘If the disk contains non-critical data like temporary logs, test data etc., consider moving this data elsewhere or completely exclude this disk from replication
  • 如果问题持续出现,请使用 Site Recovery 部署规划器来帮助规划复制。If the problem continues to persist, use the Site Recovery deployment planner to help plan replication.

源计算机无检测信号 [错误 78174]Source machines with no heartbeat [error 78174]

如果源计算机上的 Azure Site Recovery 移动代理与配置服务器 (CS) 不通信,则会发生此错误。This happens when Azure Site Recovery Mobility agent on the Source Machine is not communicating with the Configuration Server (CS).

若要解决此问题,请使用以下步骤来验证源 VM 与配置服务器之间的网络连接:To resolve the issue, use the following steps to verify the network connectivity from the source VM to the Config Server:

  1. 验证源计算机是否正在运行。Verify that the Source Machine is running.

  2. 使用具有管理员特权的帐户登录到源计算机。Sign in to the Source Machine using an account that has administrator privileges.

  3. 验证以下服务是否正在运行,如果未运行,请重启以下服务:Verify that the following services are running and if not restart the services:

    • Svagents (InMage Scout VX Agent)Svagents (InMage Scout VX Agent)
    • InMage Scout 应用程序服务InMage Scout Application Service
  4. 在源计算机上,检查位于以下位置的日志以查看错误详细信息:On the Source Machine, examine the logs at the location for error details:

    C:\Program Files (X86)\Microsoft Azure Site Recovery\agent\svagents*.logC:\Program Files (X86)\Microsoft Azure Site Recovery\agent\svagents*.log

进程服务器无检测信号 [错误 806]Process server with no heartbeat [error 806]

如果进程服务器 (PS) 未发出检测信号,请检查:In case there is no heartbeat from the Process Server (PS), check that:

  1. PS VM 已启动并正在运行PS VM is up and running

  2. 检查 PS 上的以下日志以查看错误详细信息:Check following logs on the PS for error details:

    C:\ProgramData\ASR\home\svsystems\eventmanager*.logC:\ProgramData\ASR\home\svsystems\eventmanager*.log
    andand
    C:\ProgramData\ASR\home\svsystems\monitor_protection*.logC:\ProgramData\ASR\home\svsystems\monitor_protection*.log

主目标服务器无检测信号 [错误 78022]Master target server with no heartbeat [error 78022]

如果主目标上的 Azure Site Recovery 移动代理与配置服务器不通信,则会发生此错误。This happens when Azure Site Recovery Mobility agent on the Master Target is not communicating with the Configuration Server.

若要解决此问题,请使用以下步骤验证服务状态:To resolve the issue, use the following steps to verify the service status:

  1. 验证主目标 VM 是否正在运行。Verify that the Master Target VM is running.
  2. 使用具有管理员特权的帐户登录到主目标 VM。Sign in to the Master Target VM using an account that has administrator privileges.
    • 验证 svagents 服务是否正在运行。Verify that the svagents service is running. 如果正在运行,请重启服务If it is running, restart the service

    • 检查位于以下位置的日志以查看错误详细信息:Check the logs at the location for error details:

      C:\Program Files (X86)\Microsoft Azure Site Recovery\agent\svagents*.logC:\Program Files (X86)\Microsoft Azure Site Recovery\agent\svagents*.log

  3. 若要将主目标注册到配置服务器,请导航到文件夹 %PROGRAMDATA%\ASR\Agent,并在命令提示符下运行以下命令:To register master target with configuration server, navigate to folder %PROGRAMDATA%\ASR\Agent , and run the following on command prompt:
    cmd
    cdpcli.exe --registermt
    
    net stop obengine
    
    net start obengine
    
    exit
    

错误 ID 78144 - 在过去“XXX”分钟内没有可供 VM 使用的应用一致性恢复点Error ID 78144 - No app-consistent recovery point available for the VM in the last 'XXX' minutes

已在移动代理 9.23 & 9.27 版本中进行了增强,以处理 VSS 安装失败行为。Enhancements have been made in mobility agent 9.23 & 9.27 versions to handle VSS installation failure behaviors. 请确保使用的是最新版本,以获取有关排查 VSS 故障的最佳指南。Ensure that you are on the latest versions for best guidance on troubleshooting VSS failures.

下面列出了其中的一些最常见问题Some of the most common issues are listed below

原因 1:SQL Server 2008/2008 R2 中的已知问题Cause 1: Known issue in SQL server 2008/2008 R2

如何解决 :SQL Server 2008/2008 R2 有一个已知问题。How to fix : There is a known issue with SQL server 2008/2008 R2. 请参阅此知识库文章:托管 SQL Server 2008 R2 的服务器的 Azure Site Recovery 代理或其他非组件 VSS 备份失败Please refer this KB article Azure Site Recovery Agent or other non-component VSS backup fails for a server hosting SQL Server 2008 R2

原因 2:在使用 AUTO_CLOSE DB 托管任何版本的 SQL Server 实例的服务器上,Azure Site Recovery 作业失败Cause 2: Azure Site Recovery jobs fail on servers hosting any version of SQL Server instances with AUTO_CLOSE DBs

如何解决 :请参阅知识库 文章How to fix : Refer Kb article

原因 3:SQL Server 2016 和 2017 中的已知问题Cause 3: Known issue in SQL Server 2016 and 2017

如何解决 :请参阅知识库 文章How to fix : Refer Kb article

原因 4:Linux 服务器上未启用应用一致性Cause 4: App-Consistency not enabled on Linux servers

如何解决 :适用于 Linux 操作系统的 Azure Site Recovery 支持通过应用程序自定义脚本实现应用一致性。How to fix : Azure Site Recovery for Linux Operation System supports application custom scripts for app-consistency. 为保障应用一致性,Azure Site Recovery 移动代理将使用带有 pre 和 post 选项的自定义脚本。The custom script with pre and post options will be used by the Azure Site Recovery Mobility Agent for app-consistency. 这里是启用此功能的步骤。Here are the steps to enable it.

若要进一步排除故障,请检查源计算机上的文件,获取故障的具体错误代码:To troubleshoot further, Check the files on the source machine to get the exact error code for failure:

C:\Program Files (x86)\Microsoft Azure Site Recovery\agent\Application Data\ApplicationPolicyLogs\vacp.logC:\Program Files (x86)\Microsoft Azure Site Recovery\agent\Application Data\ApplicationPolicyLogs\vacp.log

如何在文件中查找错误?How to locate the errors in the file? 在编辑器中打开 vacp.log 文件,搜索字符串“vacpError”Search for the string "vacpError" by opening the vacp.log file in an editor

Ex: vacpError:220#Following disks are in FilteringStopped state [\\.\PHYSICALDRIVE1=5, ]#220|^|224#FAILED: CheckWriterStatus().#2147754994|^|226#FAILED to revoke tags.FAILED: CheckWriterStatus().#2147754994|^|

在上面的示例中,“2147754994”是介绍故障情况的错误代码,如下所示In the above example 2147754994 is the error code that tells you about the failure as shown below

VSS 编写器未安装 - 错误 2147221164VSS writer is not installed - Error 2147221164

如何解决 :为了生成应用程序一致性标记,Azure Site Recovery 会使用 Azure 卷影复制服务 (VSS)。How to fix : To generate application consistency tag, Azure Site Recovery uses Azure Volume Shadow copy Service (VSS). 它安装适用于其操作的 VSS 提供程序,以便拍摄应用一致性快照。It installs a VSS Provider for its operation to take app consistency snapshots. 此 VSS 提供程序作为服务安装。This VSS Provider is installed as a service. 如果 VSS 提供程序服务未安装,则应用程序一致性快照创建会失败,并出现 ID 为 0x80040154 的错误“类未注册”。In case the VSS Provider service is not installed, the application consistency snapshot creation fails with the error ID 0x80040154 "Class not registered".
请参阅有关 VSS 编写器安装故障排除的文章Refer article for VSS writer installation troubleshooting

VSS 编写器已禁用 - 错误 2147943458VSS writer is disabled - Error 2147943458

如何解决 :为了生成应用程序一致性标记,Azure Site Recovery 会使用 Azure 卷影复制服务 (VSS)。How to fix : To generate application consistency tag, Azure Site Recovery uses Azure Volume Shadow copy Service (VSS). 它安装适用于其操作的 VSS 提供程序,以便拍摄应用一致性快照。It installs a VSS Provider for its operation to take app consistency snapshots. 此 VSS 提供程序作为服务安装。This VSS Provider is installed as a service. 如果 VSS 提供程序服务已禁用,则应用程序一致性快照创建会失败,并出现错误“指定的服务已禁用,无法启动(0x80070422)”。In case the VSS Provider service is disabled, the application consistency snapshot creation fails with the error ID "The specified service is disabled and cannot be started(0x80070422)".

  • 如果已禁用 VSS,If VSS is disabled,
    • 确认 VSS 提供程序服务的启动类型是否设置为“自动”。Verify that the startup type of the VSS Provider service is set to Automatic .
    • 重启以下服务:Restart the following services:
      • VSS 服务VSS service
      • Azure Site Recovery VSS 提供程序Azure Site Recovery VSS Provider
      • VDS 服务VDS service

VSS 提供程序未注册 - 错误 2147754756VSS PROVIDER NOT_REGISTERED - Error 2147754756

如何解决 :为了生成应用程序一致性标记,Azure Site Recovery 会使用 Azure 卷影复制服务 (VSS)。How to fix : To generate application consistency tag, Azure Site Recovery uses Azure Volume Shadow copy Service (VSS). 检查 Azure Site Recovery VSS 提供程序服务是否已安装。Check if the Azure Site Recovery VSS Provider service is installed or not.

  • 使用以下命令重试提供程序安装:Retry the Provider installation using the following commands:
  • 卸载现有提供程序:C:\Program Files (x86)\Microsoft Azure Site Recovery\agent\InMageVSSProvider_Uninstall.cmdUninstall existing provider: C:\Program Files (x86)\Microsoft Azure Site Recovery\agent\InMageVSSProvider_Uninstall.cmd
  • 重新安装:C:\Program Files (x86)\Microsoft Azure Site Recovery\agent\InMageVSSProvider_Install.cmdReinstall: C:\Program Files (x86)\Microsoft Azure Site Recovery\agent\InMageVSSProvider_Install.cmd

确认 VSS 提供程序服务的启动类型是否设置为“自动”。Verify that the startup type of the VSS Provider service is set to Automatic . - 重启以下服务:Restart the following services: - VSS 服务VSS service - Azure Site Recovery VSS 提供程序Azure Site Recovery VSS Provider - VDS 服务VDS service

错误 ID 95001 - 发现权限不足Error ID 95001 - Insufficient permissions found

尝试启用复制时,如果应用程序文件夹没有足够的权限,则会出现此错误。This error occurs when trying to enable replication and the application folders don't have enough permissions.

如何解决 :若要解决此问题,请确保 IUSR 用户对下面提到的所有文件夹具有“所有者”角色:How to fix : To resolve this issue, make sure the IUSR user has owner role for all the below mentioned folders -

  • C\ProgramData\Azure Site Recovery\privateC\ProgramData\Azure Site Recovery\private
  • 安装目录。The installation directory. 例如,如果安装目录为 F 驱动器,则提供对以下项的相应权限:For example, if installation directory is F drive, then provide the correct permissions to -
    • F:\Program Files (x86)\Microsoft Azure Site Recovery\home\svsystemsF:\Program Files (x86)\Microsoft Azure Site Recovery\home\svsystems
  • 安装目录中的“\pushinstallsvc”文件夹。The \pushinstallsvc folder in installation directory. 例如,如果安装目录为 F 驱动器,则提供对以下项的相应权限:For example, if installation directory is F drive, provide the correct permissions to -
    • F:\Program Files (x86)\Microsoft Azure Site Recovery\home\svsystems\pushinstallsvcF:\Program Files (x86)\Microsoft Azure Site Recovery\home\svsystems\pushinstallsvc
  • 安装目录中的“\etc”文件夹。The \etc folder in installation directory. 例如,如果安装目录为 F 驱动器,则提供对以下项的相应权限:For example, if installation directory is F drive, provide the correct permissions to -
    • F:\Program Files (x86)\Microsoft Azure Site Recovery\home\svsystems\etcF:\Program Files (x86)\Microsoft Azure Site Recovery\home\svsystems\etc
  • C:\TempC:\Temp
  • C:\thirdparty\php5ntsC:\thirdparty\php5nts
  • 以下路径下的所有项:All the items under the below path -
    • C:\thirdparty\rrdtool-1.2.15-win32-perl58\rrdtool\Release*C:\thirdparty\rrdtool-1.2.15-win32-perl58\rrdtool\Release*

后续步骤Next steps

如需更多帮助,请在有关 Azure Site Recovery 的 Microsoft Q&A 问题页面中发布问题。If you need more help, post your question in the Microsoft Q&A question page for Azure Site Recovery. 我们的社区非常活跃,将有一位工程师为你提供帮助。We have an active community, and one of our engineers can assist you.