使用 Service Fabric 诊断常见情况Diagnose common scenarios with Service Fabric
本文阐述了用户在使用 Service Fabric 进行监视和诊断时遇到的常见情况。This article illustrates common scenarios users have encountered in the area of monitoring and diagnostics with Service Fabric. 所介绍的方案涵盖了 Service Fabric 的所有 3 层:应用程序、群集和基础结构。The scenarios presented cover all 3 layers of service fabric: Application, Cluster, and Infrastructure. 每个解决方案均使用 Application Insights 和 Azure Monitor 日志(Azure 监视工具)来处理每种情况。Each solution uses Application Insights and Azure Monitor logs, Azure monitoring tools, to complete each scenario. 每个解决方案中的步骤都向用户介绍了如何在 Service Fabric 环境中使用 Application Insights 和 Azure Monitor 日志。The steps in each solution give users an introduction on how to use Application Insights and Azure Monitor logs in the context of Service Fabric.
备注
本文最近已更新,从使用术语“Log Analytics”改为使用术语“Azure Monitor 日志”。This article was recently updated to use the term Azure Monitor logs instead of Log Analytics. 日志数据仍然存储在 Log Analytics 工作区中,并仍然由同一 Log Analytics 服务收集并分析。Log data is still stored in a Log Analytics workspace and is still collected and analyzed by the same Log Analytics service. 我们正在更新术语,以便更好地反映 Azure Monitor 中日志的角色。We are updating the terminology to better reflect the role of logs in Azure Monitor. 有关详细信息,请参阅 Azure Monitor 术语更改。See Azure Monitor terminology changes for details.
先决条件和建议Prerequisites and Recommendations
本文中的解决方案将使用以下工具。The solutions in this article will use the following tools. 建议对这些工具进行设置和配置:We recommend you have these set up and configured:
- Application Insights 与 Service FabricApplication Insights with Service Fabric
- 在群集上启用 Azure 诊断Enable Azure Diagnostics on your cluster
- 设置 Log Analytics 工作区Set up a Log Analytics workspace
- 用于跟踪性能计数器的 Log Analytics 代理Log Analytics agent to track Performance Counters
如何在应用程序中查看未经处理的异常?How can I see unhandled exceptions in my application?
导航到应用程序配置的 Application Insights 资源。Navigate to your Application Insights resource that your application is configured with.
单击左上角的“搜索”。Click on Search in the top left. 然后单击下一个面板上的筛选器。Then click filter on the next panel.
你将看到很多类型的事件(跟踪、请求、自定义事件)。You will see lots of types of events (traces, requests, custom events). 选择“异常”作为筛选器。Choose "Exception" as your filter.
如果正在使用 Service Fabric Application Insights SDK,那么通过单击列表中的异常,可以查看更多详细信息,包括服务上下文。By clicking an exception in the list, you can look at more details including the service context if you are using the Service Fabric Application Insights SDK.
如何查看服务中使用的 HTTP 调用?How do I view which HTTP calls are used in my services?
在同一个 Application Insights 资源中,可以筛选“请求”而不是异常,并查看发出的所有请求In the same Application Insights resource, you can filter on "requests" instead of exceptions and view all requests made
如果正在使用 Service Fabric Application Insights SDK,则可以看到彼此连接的服务的可视形式以及成功和失败请求的数量。If you are using the Service Fabric Application Insights SDK, you can see a visual representation of your services connected to one another, and the number of succeeded and failed requests. 单击左侧的“应用程序映射”On the left click "Application Map"
有关应用程序映射的详细信息,请访问应用程序映射文档For more information on the application map, visit the Application Map documentation
如何在节点出现故障时创建警报How do I create an alert when a node goes down
节点事件由 Service Fabric 群集跟踪。Node events are tracked by your Service Fabric cluster. 导航到名为 ServiceFabric(NameofResourceGroup) 的 Service Fabric 分析解决方案资源Navigate to the Service Fabric Analytics solution resource named ServiceFabric(NameofResourceGroup)
单击标题为“摘要”的边栏选项卡底部的图表Click on the graph on the bottom of the blade titled "Summary"
此处有许多图表和磁贴,上面显示了各种指标。Here you have many graphs and tiles displaying various metrics. 单击其中一个图表,它会带你进入“日志搜索”。Click on one of the graphs and it will take you to the Log Search. 在这里,你可以查询任何群集事件或性能计数器。Here you can query for any cluster events or performance counters.
输入以下查询。Enter the following query. 这些事件 ID 位于节点事件参考中These event IDs are found in the Node events reference
ServiceFabricOperationalEvent | where EventID >= 25622 and EventID <= 25626
单击顶部的“新建警报规则”,现在只要发生基于此查询的事件,就会通过所选通信方式收到警报。Click "New Alert Rule" at the top and now anytime an event arrives based on this query, you will receive an alert in your chosen method of communication.
怎样才能收到应用程序升级回滚警报?How can I be alerted of application upgrade rollbacks?
在与之前相同的“日志搜索”窗口中,针对升级回滚输入以下查询。On the same Log Search window as before enter the following query for upgrade rollbacks. 这些事件 ID 位于应用程序事件参考下方These event IDs are found under Application events reference
ServiceFabricOperationalEvent | where EventID == 29623 or EventID == 29624
单击顶部的“新建警报规则”,现在只要发生基于此查询的事件,你就会收到警报。Click "New Alert Rule" at the top and now anytime an event arrives based on this query, you will receive an alert.
如何监视性能计数器?How can I monitor performance counters?
向群集添加 Log Analytics 代理后,需要添加要跟踪的特定性能计数器。导航到门户中的 Log Analytics 工作区页面(工作区选项卡位于解决方案页面的左侧菜单中)。Once you have added the Log Analytics agent to your cluster, you need to add the specific performance counters you want to track. Navigate to the Log Analytics workspace's page in the portal - from the solution's page the workspace tab is on the left menu.
进入工作区页面后,单击同一左侧菜单中的“高级设置”。Once you're on the workspace's page, click on "Advanced settings" in the same left menu.
单击“数据”>“Windows 性能计数器”(对于 Linux 计算机,则为“数据”>“Linux 性能计数器”),开始通过 Log Analytics 代理从节点收集特定计数器。Click on Data > Windows Performance Counters (Data > Linux Performance Counters for Linux machines) to start collecting specific counters from your nodes via the Log Analytics agent. 以下是要添加的计数器的格式示例Here are examples of the format for counters to add
.NET CLR Memory(<ProcessNameHere>)\\# Total committed Bytes
Processor(_Total)\\% Processor Time
在快速入门中,VotingData 和 VotingWeb 是所用进程名称,因此,将按以下格式跟踪这些计数器In the quickstart, VotingData and VotingWeb are the process names used, so tracking these counters would look like
.NET CLR Memory(VotingData)\\# Total committed Bytes
.NET CLR Memory(VotingWeb)\\# Total committed Bytes
这将允许你查看基础结构处理工作负荷的方式,并根据资源利用率设置相关警报。This will allow you to see how your infrastructure is handling your workloads, and set relevant alerts based on resource utilization. 例如,如果处理器总利用率高于 90% 或低于 5%,则可能需要设置警报。For example - you may want to set an alert if the total Processor utilization goes above 90% or below 5%. 此时将使用名为“处理器时间百分比”的计数器。The counter name you would use for this is "% Processor Time." 可通过为以下查询创建警报规则来执行此操作:You could do this by creating an alert rule for the following query:
Perf | where CounterName == "% Processor Time" and InstanceName == "_Total" | where CounterValue >= 90 or CounterValue <= 5.
如何跟踪 Reliable Services 和 Actors 的性能?How do I track performance of my Reliable Services and Actors?
若要跟踪应用程序中 Reliable Services 或 Actors 的性能,还应收集 Service Fabric Actor、Actor Method、Service 和 Service Method 计数器。To track the performance of Reliable Services or Actors in your applications, you should collect the Service Fabric Actor, Actor Method, Service, and Service Method counters as well. 下面是要收集的 Reliable Service 和 Actor 性能计数器的示例Here are examples of reliable service and actor performance counters to collect
备注
Log Analytics 代理当前无法收集 Service Fabric 性能计数器,但其他诊断解决方案可以收集这些计数器Service Fabric performance counters cannot be collected by the Log Analytics agent currently, but can be collected by other diagnostic solutions
Service Fabric Service(*)\\Average milliseconds per request
Service Fabric Service Method(*)\\Invocations/Sec
Service Fabric Actor(*)\\Average milliseconds per request
Service Fabric Actor Method(*)\\Invocations/Sec
在 Reliable Services 和 Actors 上查看这些链接可获取性能计数器的完整列表Check these links for the full list of performance counters on Reliable Services and Actors
后续步骤Next steps
- 查找常见代码包激活错误Look Up Common Code Package Activation Errors
- 在 AI 中设置警报以获取有关性能或使用情况的通知Set up Alerts in AI to be notified about changes in performance or usage
- Application Insights 中的智能检测针对发送给 AI 的遥测进行主动分析,向你警告潜在的性能问题Smart Detection in Application Insights performs a proactive analysis of the telemetry being sent to AI to warn you of potential performance problems
- 详细了解有助于进行检测和诊断的 Azure Monitor 日志警报。Learn more about Azure Monitor logs alerting to aid in detection and diagnostics.
- 对于本地群集,Azure Monitor 日志提供了一个网关(HTTP 正向代理),可用于向 Azure Monitor 日志发送数据。For on-premises clusters, Azure Monitor logs offers a gateway (HTTP Forward Proxy) that can be used to send data to Azure Monitor logs. 有关更多信息,请参阅使用 Log Analytics 网关将无法访问 Internet 的计算机连接到 Azure Monitor 日志Read more about that in Connecting computers without Internet access to Azure Monitor logs using the Log Analytics gateway
- 掌握 Azure Monitor 日志中提供的日志搜索和查询功能Get familiarized with the log search and querying features offered as part of Azure Monitor logs
- 有关 Azure Monitor 日志及其功能的更详细概述,请参阅什么是 Azure Monitor 日志?Get a more detailed overview of Azure Monitor logs and what it offers, read What is Azure Monitor logs?