使用 Service Fabric 诊断常见代码包错误Diagnose common code package errors by using Service Fabric

本文介绍代码包意外终止的含义。This article describes what it means for a code package to terminate unexpectedly. 其中提供了常见错误代码的可能原因以及故障排除步骤。It provides insight into possible causes of common error codes, along with troubleshooting steps.

进程或容器何时意外终止?When does a process or container terminate unexpectedly?

当 Azure Service Fabric 收到启动代码包的请求时,它会根据应用和服务清单中设置的选项,开始在本地系统上准备环境。When Azure Service Fabric receives a request to start a code package, it begins preparing the environment on the local system according to the options set in the App and Service manifests. 这些准备工作可能包括保留网络终结点或资源、配置防火墙规则或设置资源监管约束。These preparations might include reserving network endpoints or resources, configuring firewall rules, or setting up resource governance constraints.

正确配置环境后,Service Fabric 将尝试打开代码包。After the environment has been configured properly, Service Fabric tries to bring up the code package. 如果 OS 或容器运行时报告进程或容器已成功激活,则此步骤被视为成功。This step is considered successful if the OS or container runtime reports that the process or container has been activated successfully. 如果激活失败,你将在 SFX 中看到如下所示的运行状况消息:If activation is unsuccessful, you should see a health message in SFX that resembles the following:

There was an error during CodePackage activation. Service host failed to activate. Error: 0xXXXXXXXX

代码包成功激活后,Service Fabric 将开始监视该包的生存期。After the code package has been successfully activated, Service Fabric begins monitoring its lifetime. 此时,进程或容器可随时出于多种原因而终止。At this point, a process or container can terminate at any time for a number of reasons. 例如,可能的原因包括无法初始化某个 DLL,或者 OS 耗尽了桌面堆空间。For example, it might have failed to initialize a DLL, or the OS could have run out of desktop heap space. 如果代码包已终止,你将在 SFX 中看到以下运行状况消息:If your code package terminated, you should see the following health message in SFX:

The process/container terminated with exit code: XXXXXXXX. Please look at your application logs/dump or debug your code package for more details. For information about common termination errors, please visit https://aka.ms/service-fabric-termination-errors

此运行状况消息中的退出代码是进程或容器提供的有关其终止原因的唯一线索。The exit code in this health message is the only clue that the process or container provides about why it terminated. 此代码可能由任何堆栈级别生成。It could be generated by any level of the stack. 例如,此退出代码可能与 OS 错误或 .NET 问题相关,或者由程序代码引发。For example, this exit code might be related to an OS error or a .NET issue, or it might have been raised by your code. 可使用本文作为起点来诊断终止退出代码的原因并找到可能的解决方法。Use this article as a starting point for diagnosing the source of termination exit codes and possible solutions. 但请注意,这些通用解决方法适用于常见场景,而不一定适用于你所看到的错误。But keep in mind that these are general solutions to common scenarios and might not apply to the error you're seeing.

如何判断 Service Fabric 是否终止了我的代码包?How can I tell if Service Fabric terminated my code package?

Service Fabric 可能会出于各种原因而终止代码包。Service Fabric might be responsible for terminating your code package for a variety of reasons. 例如,它可能决定将代码包放在另一个节点上,以实现负载均衡。For example, it might decide to place the code package on another node for load-balancing purposes. 如果看到了下表中的任何退出代码,则可以确认 Service Fabric 已终止代码包。You can verify that Service Fabric terminated your code package if you see any of the exit codes in the following table.

备注

如果进程或容器终止并返回了退出代码,但该代码不是下表中所列的代码,则该进程或容器不是由 Service Fabric 终止的。If your process or container terminates with an exit code other than the codes in the following table, Service Fabric is not responsible for terminating it.

退出代码Exit code 说明Description
71477147 表示 Service Fabric 已通过向进程或容器发送 Ctrl+C 信号正常将其关闭。Indicates that Service Fabric gracefully shut down the process or container by sending it a Ctrl+C signal.
71487148 表示 Service Fabric 终止了进程或容器。Indicates that Service Fabric terminated the process or container. 有时,此错误代码表示在发送 Ctrl+C 信号后,进程或容器未及时做出响应,因此必须将其终止。Sometimes, this error code indicates that the process or container didn't respond in a timely manner after sending a Ctrl+C signal, and it had to be terminated.

其他常见错误代码及其可能的修复方法Other common error codes and their potential fixes

退出代码Exit code 十六进制值Hexadecimal value 简短说明Short description 根本原因Root cause 可能的修复方法Potential fix
32212257943221225794 0xc00001420xc0000142 STATUS_DLL_INIT_FAILEDSTATUS_DLL_INIT_FAILED 此错误有时表示计算机已耗尽桌面堆空间。This error sometimes means that the machine has run out of desktop heap space. 如果有极大量的进程属于节点上运行的应用程序,则这种可能性很大。This cause is especially likely if you have numerous processes that belong to your application running on the node. 如果程序未在 Ctrl+C 信号响应方面经过设计,你可以在群集清单中启用 EnableActivateNoWindow 设置。If your program wasn't built to respond to Ctrl+C signals, you can enable the EnableActivateNoWindow setting in the Cluster manifest. 启用此设置后,代码包无需使用 GUI 窗口即可运行,并且不会收到 Ctrl+C 信号。Enabling this setting means your code package will run without a GUI window and won't receive Ctrl+C signals. 此操作还可减少每个进程占用的桌面堆空间量。This action also reduces the amount of desktop heap space each process consumes. 如果代码包需要接收 Ctrl+C 信号,你可以增加节点的桌面堆大小。If your code package needs to receive Ctrl+C signals, you can increase the size of your node's desktop heap.
37625045303762504530 0xe04343520xe0434352 空值N/A 此值表示托管代码(即 .NET)中发生的未经处理的异常的错误代码。This value represents the error code for an unhandled exception from managed code (that is, .NET). 此退出代码表示应用程序引发了一个仍未处理的异常,从而终止了进程。This exit code indicates that your application raised an exception that remains unhandled and which terminated the process. 若要确定是哪种因素触发了此错误,首先请调试应用程序的日志和转储文件。As the first step in determining what triggered this error, debug your application's logs and dump files.

后续步骤Next steps

  • 阅读 Azure Monitor 概述,详细了解 Azure Monitor 日志及其提供的功能。Get a more detailed overview of Azure Monitor logs and what they offer by reading Azure Monitor overview.
  • 详细了解可帮助进行检测和诊断的 Azure Monitor 日志警报Learn more about Azure Monitor logs alerting for aid in detection and diagnostics.
  • 熟悉 Azure Monitor 日志中提供的日志搜索和查询功能。Get familiar with the log search and querying features offered as part of Azure Monitor logs.