在 Azure 网络观察程序中进行资源故障排除简介Introduction to resource troubleshooting in Azure Network Watcher

虚拟网关在 Azure 中的本地资源和其他虚拟网络之间提供连接。Virtual Network Gateways provide connectivity between on-premises resources and other virtual networks within Azure. 监视网关及其连接对于确保通信不中断至关重要。Monitoring gateways and their connections are critical to ensuring communication is not broken. 网络观察程序提供对网关和连接进行故障排除的功能。Network Watcher provides the capability to troubleshoot gateways and connections. 可通过门户、PowerShell、Azure CLI 或 REST API 调用该功能。The capability can be called through the portal, PowerShell, Azure CLI, or REST API. 调用后,网络观察程序将对网关或连接的运行状况进行诊断,并返回相应的结果。When called, Network Watcher diagnoses the health of the gateway, or connection, and returns the appropriate results. 该请求是一个长时间运行的事务。The request is a long running transaction. 诊断完成后,将返回结果。The results are returned once the diagnosis is complete.

门户

结果Results

可以通过返回的初步结果大致了解资源的运行状况。The preliminary results returned give an overall picture of the health of the resource. 可以为资源提供更深层的信息,如以下部分所示:Deeper information can be provided for resources as shown in the following section:

以下列表是通过故障排除 API 返回的值:The following list is the values returned with the troubleshoot API:

  • startTime - 此值是故障排除 API 调用的开始时间。startTime - This value is the time the troubleshoot API call started.
  • endTime - 此值是故障排除的结束时间。endTime - This value is the time when the troubleshooting ended.
  • code - 如果存在单个诊断故障,则此值为 UnHealthy。code - This value is UnHealthy, if there is a single diagnosis failure.
  • results - 返回的关于连接或虚拟网关的结果的集合。results - Results is a collection of results returned on the Connection or the virtual network gateway.
    • id - 此值为错误类型。id - This value is the fault type.
    • summary - 此值为错误的摘要。summary - This value is a summary of the fault.
    • detailed - 此值提供对错误的详细说明。detailed - This value provides a detailed description of the fault.
    • recommendedActions - 此属性是要采取的建议操作的集合。recommendedActions - This property is a collection of recommended actions to take.
      • actionText - 此值包含的文本描述了要采取的具体操作。actionText - This value contains the text describing what action to take.
      • actionUri - 此值提供操作说明文档的 URI。actionUri - This value provides the URI to documentation on how to act.
      • actionUriText - 此值对操作文本进行了简短的说明。actionUriText - This value is a short description of the action text.

下表显示了提供的不同错误类型(即前面的列表中结果下的 id)以及该错误是否创建日志。The following tables show the different fault types (id under results from the preceding list) that are available and if the fault creates logs.

网关Gateway

错误类型Fault Type 原因Reason 日志Log
NoFaultNoFault 未检测到任何错误When no error is detected Yes
GatewayNotFoundGatewayNotFound 无法找到网关,或未预配网关Cannot find gateway or gateway is not provisioned No
PlannedMaintenancePlannedMaintenance 网关实例处于维护状态Gateway instance is under maintenance No
UserDrivenUpdateUserDrivenUpdate 用户更新正在进行时发生此故障。This fault occurs when a user update is in progress. 更新可能是重设大小操作。The update could be a resize operation. No
VipUnResponsiveVipUnResponsive 由于运行状况探测失败导致无法访问网关的主实例时发生此故障。This fault occurs when the primary instance of the gateway can't be reached due to a health probe failure. No
PlatformInActivePlatformInActive 平台出现问题。There is an issue with the platform. No
ServiceNotRunningServiceNotRunning 底层服务未运行。The underlying service is not running. No
NoConnectionsFoundForGatewayNoConnectionsFoundForGateway 网关未建立连接。No connections exist on the gateway. 此错误只是一条警告。This fault is only a warning. No
ConnectionsNotConnectedConnectionsNotConnected 未连接任何连接。Connections are not connected. 此错误只是一条警告。This fault is only a warning. Yes
GatewayCPUUsageExceededGatewayCPUUsageExceeded 当前网关 CPU 使用率超过 95%。The current gateway CPU usage is > 95%. Yes

连接Connection

错误类型Fault Type 原因Reason 日志Log
NoFaultNoFault 未检测到任何错误When no error is detected Yes
GatewayNotFoundGatewayNotFound 无法找到网关,或未预配网关Cannot find gateway or gateway is not provisioned No
PlannedMaintenancePlannedMaintenance 网关实例处于维护状态Gateway instance is under maintenance No
UserDrivenUpdateUserDrivenUpdate 用户更新正在进行时发生此故障。This fault occurs when a user update is in progress. 更新可能是重设大小操作。The update could be a resize operation. No
VipUnResponsiveVipUnResponsive 由于运行状况探测失败导致无法访问网关的主实例时发生此故障。This fault occurs when the primary instance of the gateway can't be reached due to a health probe failure. No
ConnectionEntityNotFoundConnectionEntityNotFound 连接配置缺失Connection configuration is missing No
ConnectionIsMarkedDisconnectedConnectionIsMarkedDisconnected 连接标记为“断开连接”The connection is marked "disconnected" No
ConnectionNotConfiguredOnGatewayConnectionNotConfiguredOnGateway 未在基础服务上配置连接。The underlying service does not have the connection configured. Yes
ConnectionMarkedStandbyConnectionMarkedStandby 底层服务标记为备用。The underlying service is marked as standby. Yes
AuthenticationAuthentication 预共享密钥不匹配Preshared key mismatch Yes
PeerReachabilityPeerReachability 无法访问对等网关。The peer gateway is not reachable. Yes
IkePolicyMismatchIkePolicyMismatch 对等网关中的 IKE 策略不受 Azure 支持。The peer gateway has IKE policies that are not supported by Azure. Yes
WfpParse ErrorWfpParse Error 分析 WFP 日志时出错。An error occurred parsing the WFP log. Yes

支持的网关类型Supported Gateway types

下表列出了网络观察程序故障排除支持的网关和连接:The following table lists which gateways and connections are supported with Network Watcher troubleshooting:

网关类型Gateway types
VPNVPN 支持Supported
ExpressRouteExpressRoute 不支持Not Supported
VPN 类型VPN types
基于路由Route Based 支持Supported
基于策略Policy Based 不支持Not Supported
连接类型Connection types
IPSecIPSec 支持Supported
VNet2VnetVNet2Vnet 支持Supported
ExpressRouteExpressRoute 不支持Not Supported
VPNClientVPNClient 不支持Not Supported

日志文件Log files

资源故障诊断完成后,其日志文件存储在存储帐户中。The resource troubleshooting log files are stored in a storage account after resource troubleshooting is finished. 下图显示了导致错误的调用的示例内容。The following image shows the example contents of a call that resulted in an error.

zip 文件

Note

在某些情况下,仅部分日志文件写入到存储中。In some cases, only a subset of the logs files is written to storage.

有关从 Azure 存储帐户下载文件的说明,请参阅通过 .NET 使用 Azure Blob 存储入门For instructions on downloading files from azure storage accounts, refer to Get started with Azure Blob storage using .NET. 可以使用的另一个工具是存储资源管理器。Another tool that can be used is Storage Explorer. 有关存储资源管理器的详细信息可以在此链接中找到:存储资源管理器More information about Storage Explorer can be found here at the following link: Storage Explorer

ConnectionStats.txtConnectionStats.txt

ConnectionStats.txt 文件包含连接的综合性统计信息,包括入口和出口字节数、连接状态、建立连接的时间。The ConnectionStats.txt file contains overall stats of the Connection, including ingress and egress bytes, Connection status, and the time the Connection was established.

Note

如果调用故障排除 API 后返回“正常”,则只在 zip 文件中返回 ConnectionStats.txt 文件。If the call to the troubleshooting API returns healthy, the only thing returned in the zip file is a ConnectionStats.txt file.

该文件的内容类似于以下示例:The contents of this file are similar to the following example:

Connectivity State : Connected
Remote Tunnel Endpoint :
Ingress Bytes (since last connected) : 288 B
Egress Bytes (Since last connected) : 288 B
Connected Since : 2/1/2017 8:22:06 PM

CPUStats.txtCPUStats.txt

CPUStats.txt 文件包含测试时的 CPU 使用率和可用内存。The CPUStats.txt file contains CPU usage and memory available at the time of testing. 该文件的内容类似于以下示例:The contents of this file is similar to the following example:

Current CPU Usage : 0 % Current Memory Available : 641 MBs

IKEErrors.txtIKEErrors.txt

IKEErrors.txt 文件包含在监视过程中发现的任何 IKE 错误。The IKEErrors.txt file contains any IKE errors that were found during monitoring.

以下示例显示一个 IKEErrors.txt 文件的内容。The following example shows the contents of an IKEErrors.txt file. 用户的错误可能因问题的不同而不同。Your errors may be different depending on the issue.

Error: Authentication failed. Check shared key. Check crypto. Check lifetimes. 
     based on log : Peer failed with Windows error 13801(ERROR_IPSEC_IKE_AUTH_FAIL)
Error: On-prem device sent invalid payload. 
     based on log : IkeFindPayloadInPacket failed with Windows error 13843(ERROR_IPSEC_IKE_INVALID_PAYLOAD)

Scrubbed-wfpdiag.txtScrubbed-wfpdiag.txt

Scrubbed-wfpdiag.txt 日志文件包含 wfp 日志。The Scrubbed-wfpdiag.txt log file contains the wfp log. 该日志包含对数据包丢弃操作和 IKE/AuthIP 故障的日志记录。This log contains logging of packet drop and IKE/AuthIP failures.

以下示例显示 Scrubbed-wfpdiag.txt 文件的内容。The following example shows the contents of the Scrubbed-wfpdiag.txt file. 在此示例中,连接的共享密钥不正确(倒数第 3 行)。In this example, the shared key of a Connection was not correct as can be seen from the third line from the bottom. 以下示例只是完整日志的一个片段,因为日志可能很长(具体取决于问题)。The following example is just a snippet of the entire log, as the log can be lengthy depending on the issue.

...
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|Deleted ICookie from the high priority thread pool list
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|IKE diagnostic event:
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|Event Header:
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|  Timestamp: 1601-01-01T00:00:00.000Z
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|  Flags: 0x00000106
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|    Local address field set
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|    Remote address field set
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|    IP version field set
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|  IP version: IPv4
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|  IP protocol: 0
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|  Local address: 13.78.238.92
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|  Remote address: 52.161.24.36
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|  Local Port: 0
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|  Remote Port: 0
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|  Application ID:
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|  User SID: <invalid>
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|Failure type: IKE/Authip Main Mode Failure
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|Type specific info:
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|  Failure error code:0x000035e9
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|    IKE authentication credentials are unacceptable
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|
[0]0368.03A4::02/02/2017-17:36:01.496 [ikeext] 3038|52.161.24.36|  Failure point: Remote
...

wfpdiag.txt.sumwfpdiag.txt.sum

wfpdiag.txt.sum 文件是一个显示已处理的缓冲区和事件的日志。The wfpdiag.txt.sum file is a log showing the buffers and events processed.

以下示例是 wfpdiag.txt.sum 文件的内容。The following example is the contents of a wfpdiag.txt.sum file.

Files Processed:
    C:\Resources\directory\924336c47dd045d5a246c349b8ae57f2.GatewayTenantWorker.DiagnosticsStorage\2017-02-02T17-34-23\wfpdiag.etl
Total Buffers Processed 8
Total Events  Processed 2169
Total Events  Lost      0
Total Format  Errors    0
Total Formats Unknown   486
Elapsed Time            330 sec
+-----------------------------------------------------------------------------------+
|EventCount    EventName            EventType   TMF                                 |
+-----------------------------------------------------------------------------------+
|        36    ikeext               ike_addr_utils_c844  a0c064ca-d954-350a-8b2f-1a7464eef8b6|
|        12    ikeext               ike_addr_utils_c857  a0c064ca-d954-350a-8b2f-1a7464eef8b6|
|        96    ikeext               ike_addr_utils_c832  a0c064ca-d954-350a-8b2f-1a7464eef8b6|
|         6    ikeext               ike_bfe_callbacks_c133  1dc2d67f-8381-6303-e314-6c1452eeb529|
|         6    ikeext               ike_bfe_callbacks_c61  1dc2d67f-8381-6303-e314-6c1452eeb529|
|        12    ikeext               ike_sa_management_c5698  7857a320-42ee-6e90-d5d9-3f414e3ea2d3|
|         6    ikeext               ike_sa_management_c8447  7857a320-42ee-6e90-d5d9-3f414e3ea2d3|
|        12    ikeext               ike_sa_management_c494  7857a320-42ee-6e90-d5d9-3f414e3ea2d3|
|        12    ikeext               ike_sa_management_c642  7857a320-42ee-6e90-d5d9-3f414e3ea2d3|
|         6    ikeext               ike_sa_management_c3162  7857a320-42ee-6e90-d5d9-3f414e3ea2d3|
|        12    ikeext               ike_sa_management_c3307  7857a320-42ee-6e90-d5d9-3f414e3ea2d3|